Closed Olshansk closed 1 year ago
@okdas Can you PTAL at the requirements in this ticket? I know you've been putting a lot of thought into it, but I think we can potentially get some help on this piece of work, which will link what I'll be doing in M3 and what you're doing in M2.
cc @jessicadaugherty - this will be a great bounty after we triage it.
So I am trying to wrap my head around the current behaviour, so I can try and implement the ticket. And have a few questions:
1) So looking into app/pocket/main.go
, runtime/manager.go
and shared/node.go
I can see these are the current entry points to starting the node. However, from this, I am getting the understanding that the default behaviour of creating a new node and starting all the modules is to become a Validator
actor. My question is whether there is currently any logic that once the Utility
module is started means the Validator
behaviour is begun? (Like a StartValidating
function for example 😆) If this is defined then adding the logic to build different actor binaries would be more straightforward. If however, this is not the case, from my current understanding something like this would need to be implemented (is this the case as I may be missing something)
2) In conjunction with the first question - I cannot seem to find any current method exposed that returns the actor type of a node - unless I am missing something again. If this is the case, my initial thought is to add some sort of field to the UtilityContext or something similar that is set upon the module's creation and retrieve this.
If I have gone wrong with anything I've mentioned please let me know - I will keep looking into this over the weekend hopefully to start next week.
Can you PTAL at the requirements in this ticket?
The current requirements look good. One part I am not sure about is Update the configs to specify the type of actor(s) being deployed on this node
. If we are going to have multiple binaries, one per actor, we probably don't need that configuration option.
We might, however, require actor-specific configs we need to expose. For example, adding { "portal": { "listen_port": 80, ... }}
or similar configurations makes sense to me. Considering our configuration is done “by module” would that mean we need to introduce a “portal” module? I would like to avoid creating a separate config for actor-specific parameters, but I suspect that can be done too.
Moreover, I think we can add a container image build for each actor once the binary is compiled.
Something we could use to solve this is https://goreleaser.com/ - I was looking at whether we could use it to build go binaries along container images, but it was quicker to add a custom image build. Maybe it's time to revisit?
GoReleaser also can package brew
binaries and can handle changelogs with releases/pre-releases. There are a couple of requirements – we need to follow semantic versioning (we do), and there could be complications with CGO (AFAIK we do not use it). So it should work for us.
Something we could use to solve this is goreleaser.com - I was looking at whether we could use it to build go binaries along container images, but it was quicker to add a custom image build. Maybe it's time to revisit?
GoReleaser also can package brew binaries and can handle changelogs with releases/pre-releases. There are a couple of requirements – we need to follow semantic versioning (we do), and there could be complications with CGO (AFAIK we do not use it). So it should work for us.
The goal here is to have the "code modifications" that will be used by the infra above.
I think everything related to the infrastructure related to streamlining, sharing, versioning, etc is outside of the scope of this PR.
@okdas
The current requirements look good. One part I am not sure about is Update the configs to specify the type of actor(s) being deployed on this node. If we are going to have multiple binaries, one per actor, we probably don't need that configuration option.
We might, however, require actor-specific configs we need to expose. For example, adding { "portal": { "listen_port": 80, ... }} or similar configurations makes sense to me. Considering our configuration is done “by module” would that mean we need to introduce a “portal” module? I would like to avoid creating a separate config for actor-specific parameters, but I suspect that can be done too.
Makes sense. In that case I'm thinking of adding the following requirements:
make build_portal
, make build_fisherman
, make build_servicer
, etc...@okdas:
So looking into app/pocket/main.go, runtime/manager.go and shared/node.go I can see these are the current entry points to starting the node. However, from this, I am getting the understanding that the default behaviour of creating a new node and starting all the modules is to become a Validator actor.
Once #528 by @gokutheengineer is merged in, there will be a codepath for synching full nodes that are not validators. See state_machine/docs/state-machine.diagram.md
int he PR for a reference.
My question is whether there is currently any logic that once the Utility module is started means the Validator behaviour is begun? (Like a StartValidating function for example 😆) If this is defined then adding the logic to build different actor binaries would be more straightforward. If however, this is not the case, from my current understanding something like this would need to be implemented (is this the case as I may be missing something)
The Validator
is a special case because it touches the consensus module, so (again, per @gokutheengineer's PR), the only difference will be:
To do this, I would add a oneof ValidatorConfig
and a FullNodeConfig
inside consensus_config.proto
Outside of the validator (or not), I would start with the utility module.
FishermanConfig
, PortalConfig
, ServicerConfig
which are not mutually exclusive.ServicerModule
, FishermanModule
and PortalModule
which can be started stopped/started individually. Again, see the StateSyncModule
@gokutheengineer is introducing in his PR...
From this article, but they're not much use (right now), because we have configs validating the code flow.
@okdas Any feedback on the design above?
If I have gone wrong with anything I've mentioned please let me know - I will keep looking into this over the weekend hopefully to start next week.
This is a much more open-ended problem, so will keep thinking/sharing ideas. Might need to prototype it myself if it's still not clear.
first pass with notes, will be moving and grooving here https://github.com/pokt-network/pocket/compare/0xbigboss/dw-1860/core-deploying-all-the-actors
Ty @0xBigBoss :)
@0xBigBoss I was going to suggest that the presence of the fisherman/servicer config could determine if its enabled/disabled, but having the boolean can actually make toggling/developing much easier so +1.
Within the utility module, I suggest you looked at what @gokutheengineer has been doing with the StateSync
module in consensus. It's like a "submodule" inside the consensus module, and I was thinking Fisherman / Servicer / etc business logic could be similar
@Olshansk yes, thank you for the pointer. I did my best to brush up on the module docs and move in this direction.
I pushed a new commit experimenting with standalone modules. This works thus far for servicer and fisherman and think it is generally a good approach to strapping on the utility specific stuff. I am not convinced I found the best way yet though to start the actor-specific modules. I included an engines
field, but seems like unnecessary/overkill, the engines field seems more appropriate to other modules that need start other sub-modules.
I also haven't quite reviewed how it should all work e2e and how siloed these utility modules can be. I haven't reviewed the validator bootstrap logic and how that should come into play here. Will think on this more after I have read more on how that module starts.
@0xBigBoss I'm not 100% sure if you were looking for feedback yet, so just going to send suggestions based on a high-level brief overview.
engines
- overkill IMO; let's keep it simpleenableActorModules
- we're only going to have a handful of actors. Even if we introduce new ones, this is not something that will grow very large. I think keeping things verbose (avoiding lists here) will keep it simpler longer-term.If there's any specific piece that you have questions about, lmk
@Olshansk I believe I am set with the skeleton configs for the various actors and creating/starting them in the utility module. I did end up keeping one list, actorModules
that represents the current actors enabled for the node. There may be a more clever way to not make it a list, but since we are allowing for multiple actor types in some cases, it seemed fitting.
@h5law I did end up going with the approach you recommended for now, and kept a top-level ValidatorConfig
. Still not set that this is the best path forward longterm since I still believe this sort of overlaps with the consensus config already. 🤣
@gokutheengineer If you could, please send me some easy instructions on how to add non-validator, full nodes to localnet so I can start integrating these new configs.
In the meantime, I'll start on the RPC calls and the plumbing for that.
@Olshansk I believe I am set with the skeleton configs for the various actors and creating/starting them in the utility module. I did end up keeping one list, actorModules that represents the current actors enabled for the node. There may be a more clever way to not make it a list, but since we are allowing for multiple actor types in some cases, it seemed fitting.
👍 Will think about it myself a bit more as well, but definitely a great way to start.
Any chance you can open up a draft PR? It'll be a good way to leave comments & have a discussion along the way.
@h5law I did end up going with the approach you recommended for now, and kept a top-level ValidatorConfig. Still not set that this is the best path forward longterm since I still believe this sort of overlaps with the consensus config already.
I think its reasonable. Not an irreversible decision.
@gokutheengineer If you could, please send me some easy instructions on how to add non-validator, full nodes to localnet so I can start integrating these new configs.
This isn't ready YET. The PRs are in flight so we should hopefully have them ready next week along with the markdown readme.
In the meantime, I'll start on the RPC calls and the plumbing for that.
👍
@0xBigBoss Just sharing a screenshot from a conversation we had today rather than summarizing. Stay tuned!
Very cool @Olshansk . I have started in that direction in my PR #710. Though it's not completely there yet since the fisherman/servicer aren't staked by the cluster manager, I tried to add a hacky method of overriding some of the genesis values.
I think once I have clarity on the full node vs validator configuration, it will be straightforward to bring this home.
Just chiming in here on P2P timing and coordination: #505 is a dependency for the P2P module to support communication with non-staked actors (e.g. full-nodes). I've just put #707 up for review which I expect to be part 1 of 2 to close #505.
TLDR (why); Here's an excerpt from the P2P README update in #707:
flowchart TD
subgraph lMod[Local P2P Module]
subgraph lHost[Libp2p `Host`]
end
subgraph lRT[Raintree Router]
subgraph lRTPS[Raintree Peerstore]
lStakedPS([staked actors only])
end
subgraph lPM[PeerManager]
end
lPM --> lRTPS
end
subgraph lBG[Background Router]
subgraph lBGPS[Background Peerstore]
lNetPS([all P2P participants])
end
subgraph lGossipSub[GossipSub]
end
subgraph lDHT[Kademlia DHT]
end
lGossipSub --> lBGPS
lDHT --> lBGPS
end
lRT --1a--> lHost
lBG --1b--> lHost
end
subgraph rMod[Remote P2P Module]
subgraph rHost[Libp2p `Host`]
end
subgraph rRT[Raintree Router]
subgraph rPS[Raintree Peerstore]
rStakedPS([staked actors only])
end
subgraph rPM[PeerManager]
end
rPM --> rStakedPS
end
subgraph rBG[Background Router]
subgraph rBGPS[Background Peerstore]
rNetPS([arr P2P participants])
end
subgraph rGossipSub[GossipSub]
end
subgraph rDHT[Kademria DHT]
end
rGossipSub --> rBGPS
rDHT --> rBGPS
end
rHost -. "setStreamHandler()" .-> hs[[handleStream]]
hs --3a--> rRT
hs --3b--> rBG
rBG --"4a (cont. propagation)"--> rHost
linkStyle 11 stroke:#ff3
rRT --"4b (cont. propagation)"--> rHost
linkStyle 12 stroke:#ff3
end
lHost --2--> rHost
Just chiming in here on P2P timing and coordination: #505 is a dependency for the P2P module to support communication with non-staked actors (e.g. full-nodes). I've just put #707 up for review which I expect to be part 1 of 2 to close #505.
TLDR (why); Here's an excerpt from the P2P README update in #707:
flowchart TD subgraph lMod[Local P2P Module] subgraph lHost[Libp2p `Host`] end subgraph lRT[Raintree Router] subgraph lRTPS[Raintree Peerstore] lStakedPS([staked actors only]) end subgraph lPM[PeerManager] end lPM --> lRTPS end subgraph lBG[Background Router] subgraph lBGPS[Background Peerstore] lNetPS([all P2P participants]) end subgraph lGossipSub[GossipSub] end subgraph lDHT[Kademlia DHT] end lGossipSub --> lBGPS lDHT --> lBGPS end lRT --1a--> lHost lBG --1b--> lHost end subgraph rMod[Remote P2P Module] subgraph rHost[Libp2p `Host`] end subgraph rRT[Raintree Router] subgraph rPS[Raintree Peerstore] rStakedPS([staked actors only]) end subgraph rPM[PeerManager] end rPM --> rStakedPS end subgraph rBG[Background Router] subgraph rBGPS[Background Peerstore] rNetPS([arr P2P participants]) end subgraph rGossipSub[GossipSub] end subgraph rDHT[Kademria DHT] end rGossipSub --> rBGPS rDHT --> rBGPS end rHost -. "setStreamHandler()" .-> hs[[handleStream]] hs --3a--> rRT hs --3b--> rBG rBG --"4a (cont. propagation)"--> rHost linkStyle 11 stroke:#ff3 rRT --"4b (cont. propagation)"--> rHost linkStyle 12 stroke:#ff3 end lHost --2--> rHost
Thanks for the details @bryanchriswhite! Amazing and very clear diagram. There's a lot going on and it's very easy to understand so I want to make sure that doesn't go unnoticed. 🙏
Objective
Enable deploying different types of protocol actors.
Origin Document
The utility specification outlines several different types of actors:
Validator
Servicer
Fisherman
Portal
Full Nodes
(a full node that is not a protocol actor)As of writing, the V1 repo only supports running validators and the debug client is a "makeshift" full node. As we are about to start working on M3 and have a live devnet, there needs to be an easy way to build and scale different types of actors.
Note that
Application
andActor
do not fall into the scope of this ticket.V0 - Swagger Documentation (as a reference)
Note: It is slightly outdated
The V0 RPC spec can be found here.
For example, the height can be queried like so:
V1 - Swagger Documentation (as a reference)
Note: It is still in the early stages
The V1 RPC spec can be found here.
Goals
Deliverable
Note that there are a lot of notes and deliverables below. These should be used as a guide but the implementor is expected to use their best judgment and creativity to achieve the goals above.
config.go
to contain the type of actor being deployedXXXConfig
(e.g.FishermanConfig
) protos that we can build on in the future would helpconfig.json
files for each actor(s) being deployed on this nodeUtility Module
) that is specific to each protocol atorServicer
can also, optionally, be aValidator
at the same timePortal
can not double as anyone else must be a standalonerandombackground gossip for full nodes, @gokutheengineer is working on state sync with full nodes, @okdas is working on infrastructure to deploy them; please ping them in the public channels for more details. @Olshansk is not doing anything ;'(relayDispatch
endpoint and if the actor type is not a servicer, it should return an error.v1/servicer/...
or/v1/fisherman/...
Non-goals / Non-deliverables
General issue deliverables
Testing Methodology
make test_all
LocalNet
is still functioning correctly by following the instructions at docs/development/README.mdCreator: @Olshansk Co-Owners: @okdas