[Core] Deploying all-the-protocol-actors

Olshansk commented 1 year ago

Objective

Enable deploying different types of protocol actors.

Origin Document

The utility specification outlines several different types of actors:

Validator
Servicer
Fisherman
Portal
Full Nodes (a full node that is not a protocol actor)

As of writing, the V1 repo only supports running validators and the debug client is a "makeshift" full node. As we are about to start working on M3 and have a live devnet, there needs to be an easy way to build and scale different types of actors.

Note that Application and Actor do not fall into the scope of this ticket.

V0 - Swagger Documentation (as a reference)

Note: It is slightly outdated

The V0 RPC spec can be found here.

For example, the height can be queried like so:

curl --location -X 'POST' -H 'Content-Type: application/json' 'https://<user>:<pass>@<hostname>:<port>/v1/query/height' | jq

{
  "height": 89554
}

V1 - Swagger Documentation (as a reference)

Note: It is still in the early stages

The V1 RPC spec can be found here.

Goals

Enable deploying different types of protocol actors from a single binary
Enable scaling different types of protocol actors
Enable iterating on the business logic for M3 (locally & and in devnet)
Set a foundation for full nodes, which will extend to light clients

Deliverable

Note that there are a lot of notes and deliverables below. These should be used as a guide but the implementor is expected to use their best judgment and creativity to achieve the goals above.

[ ] Update config.go to contain the type of actor being deployed
- Note to implementor: Creating empty XXXConfig (e.g. FishermanConfig) protos that we can build on in the future would help
[ ] Update/add config.json files for each actor(s) being deployed on this node
[ ] Add empty placeholder logic (e.g. a print statement in the Utility Module) that is specific to each protocol ator
- Note to implementor: For example, some code path should only execute if a node is actor X but should not execute if it's actor Y; this will require some thought and creativity
[ ] Add business logic and tests that limit the types of permutations a single node can be. E.g.
- A Servicer can also, optionally, be a Validator at the same time
- A Portal can not double as anyone else must be a standalone
- Note to implementor: Use your best judgment and we'll review during the PR process
[ ] Update LocalNet (k8s and docker-compose) to deploy the following configuration:
- 4 Validators (1 of these should also be a servicer in parallel)
- 1 full node
- 1 servicer
- 1 fisherman
- Note to implementor: portal will be added in the future
- Note to implementor: @bryanchriswhite is working on ~~random~~ background gossip for full nodes, @gokutheengineer is working on state sync with full nodes, @okdas is working on infrastructure to deploy them; please ping them in the public channels for more details. @Olshansk is not doing anything ;'(
[ ] Add an RPC endpoint that returns the type of actor(s) running on the node (see the code below as an example)
[ ] Update the CLI to be able to query the RPC endpoint above
[ ] Add new RPC endpoints (with placeholder backing code) that either return a static response or an error if the actor type should not support them
- Note to implementor: For example, you can add the relayDispatch endpoint and if the actor type is not a servicer, it should return an error.
- Note to implementor: For example, looking at the v1 swagger file, you can implement v1/servicer/... or /v1/fisherman/...

curl --location -X 'POST' -H 'Content-Type: application/json' 'https://<user>:<pass>@<hostname>:<port>/v1/query/nodeRoles' | jq

{
  "node_roles":
  [
    "servicer",
    "validator",
  ] 
}

Non-goals / Non-deliverables

Implementing new business logic for the actor-specific logic
Exhaustively updating all the source code with the appropriate build tags

General issue deliverables

[ ] Update the appropriate CHANGELOG(s)
[ ] Update any relevant local/global README(s)
[ ] Update relevant source code tree explanations
[ ] Add or update any relevant or supporting mermaid diagrams

Testing Methodology

[ ] Required: A demo document (w/ video) showing this in action
[ ] All tests: make test_all
[ ] LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @Olshansk Co-Owners: @okdas

Olshansk commented 1 year ago

@okdas Can you PTAL at the requirements in this ticket? I know you've been putting a lot of thought into it, but I think we can potentially get some help on this piece of work, which will link what I'll be doing in M3 and what you're doing in M2.

cc @jessicadaugherty - this will be a great bounty after we triage it.

h5law commented 1 year ago

So I am trying to wrap my head around the current behaviour, so I can try and implement the ticket. And have a few questions:

1) So looking into app/pocket/main.go, runtime/manager.go and shared/node.go I can see these are the current entry points to starting the node. However, from this, I am getting the understanding that the default behaviour of creating a new node and starting all the modules is to become a Validator actor. My question is whether there is currently any logic that once the Utility module is started means the Validator behaviour is begun? (Like a StartValidating function for example 😆) If this is defined then adding the logic to build different actor binaries would be more straightforward. If however, this is not the case, from my current understanding something like this would need to be implemented (is this the case as I may be missing something)

2) In conjunction with the first question - I cannot seem to find any current method exposed that returns the actor type of a node - unless I am missing something again. If this is the case, my initial thought is to add some sort of field to the UtilityContext or something similar that is set upon the module's creation and retrieve this.

If I have gone wrong with anything I've mentioned please let me know - I will keep looking into this over the weekend hopefully to start next week.

okdas commented 1 year ago

Can you PTAL at the requirements in this ticket?

The current requirements look good. One part I am not sure about is Update the configs to specify the type of actor(s) being deployed on this node. If we are going to have multiple binaries, one per actor, we probably don't need that configuration option.

We might, however, require actor-specific configs we need to expose. For example, adding { "portal": { "listen_port": 80, ... }} or similar configurations makes sense to me. Considering our configuration is done “by module” would that mean we need to introduce a “portal” module? I would like to avoid creating a separate config for actor-specific parameters, but I suspect that can be done too.

Moreover, I think we can add a container image build for each actor once the binary is compiled.

okdas commented 1 year ago

Something we could use to solve this is https://goreleaser.com/ - I was looking at whether we could use it to build go binaries along container images, but it was quicker to add a custom image build. Maybe it's time to revisit?

GoReleaser also can package brew binaries and can handle changelogs with releases/pre-releases. There are a couple of requirements – we need to follow semantic versioning (we do), and there could be complications with CGO (AFAIK we do not use it). So it should work for us.

Olshansk commented 1 year ago

Something we could use to solve this is goreleaser.com - I was looking at whether we could use it to build go binaries along container images, but it was quicker to add a custom image build. Maybe it's time to revisit?

GoReleaser also can package brew binaries and can handle changelogs with releases/pre-releases. There are a couple of requirements – we need to follow semantic versioning (we do), and there could be complications with CGO (AFAIK we do not use it). So it should work for us.

The goal here is to have the "code modifications" that will be used by the infra above.

I think everything related to the infrastructure related to streamlining, sharing, versioning, etc is outside of the scope of this PR.

@okdas

Does that make sense?
Can you create an infra ticket for it though?

The current requirements look good. One part I am not sure about is Update the configs to specify the type of actor(s) being deployed on this node. If we are going to have multiple binaries, one per actor, we probably don't need that configuration option.

We might, however, require actor-specific configs we need to expose. For example, adding { "portal": { "listen_port": 80, ... }} or similar configurations makes sense to me. Considering our configuration is done “by module” would that mean we need to introduce a “portal” module? I would like to avoid creating a separate config for actor-specific parameters, but I suspect that can be done too.

Makes sense. In that case I'm thinking of adding the following requirements:

[ ] Add an empty config proto for each actor (portal, fisherman, servicer, validator, application)
[ ] Add make target to build each binary: make build_portal, make build_fisherman, make build_servicer, etc...

@okdas:

Do these two additional deliverables capture what you're suggesting?

So looking into app/pocket/main.go, runtime/manager.go and shared/node.go I can see these are the current entry points to starting the node. However, from this, I am getting the understanding that the default behaviour of creating a new node and starting all the modules is to become a Validator actor.

Once #528 by @gokutheengineer is merged in, there will be a codepath for synching full nodes that are not validators. See state_machine/docs/state-machine.diagram.md int he PR for a reference.

My question is whether there is currently any logic that once the Utility module is started means the Validator behaviour is begun? (Like a StartValidating function for example 😆) If this is defined then adding the logic to build different actor binaries would be more straightforward. If however, this is not the case, from my current understanding something like this would need to be implemented (is this the case as I may be missing something)

The Validator is a special case because it touches the consensus module, so (again, per @gokutheengineer's PR), the only difference will be:

Am I in Pacemaker mode?
Am I in synching mode?

To do this, I would add a oneof ValidatorConfig and a FullNodeConfig inside consensus_config.proto

Outside of the validator (or not), I would start with the utility module.

You can embed (not a one of) FishermanConfig, PortalConfig, ServicerConfig which are not mutually exclusive.
I'd introduce placeholder ServicerModule, FishermanModuleand PortalModule which can be started stopped/started individually. Again, see the StateSyncModule @gokutheengineer is introducing in his PR
When the utility module starts, it'll optionally start each of the other utility submodules (depending on the configs) since they are not mutually exclusive.

...

From this article, but they're not much use (right now), because we have configs validating the code flow.

@okdas Any feedback on the design above?

If I have gone wrong with anything I've mentioned please let me know - I will keep looking into this over the weekend hopefully to start next week.

This is a much more open-ended problem, so will keep thinking/sharing ideas. Might need to prototype it myself if it's still not clear.

0xBigBoss commented 1 year ago

first pass with notes, will be moving and grooving here https://github.com/pokt-network/pocket/compare/0xbigboss/dw-1860/core-deploying-all-the-actors

jessicadaugherty commented 1 year ago

Ty @0xBigBoss :)

Olshansk commented 1 year ago

@0xBigBoss I was going to suggest that the presence of the fisherman/servicer config could determine if its enabled/disabled, but having the boolean can actually make toggling/developing much easier so +1.

Within the utility module, I suggest you looked at what @gokutheengineer has been doing with the StateSync module in consensus. It's like a "submodule" inside the consensus module, and I was thinking Fisherman / Servicer / etc business logic could be similar

0xBigBoss commented 1 year ago

@Olshansk yes, thank you for the pointer. I did my best to brush up on the module docs and move in this direction.

I pushed a new commit experimenting with standalone modules. This works thus far for servicer and fisherman and think it is generally a good approach to strapping on the utility specific stuff. I am not convinced I found the best way yet though to start the actor-specific modules. I included an engines field, but seems like unnecessary/overkill, the engines field seems more appropriate to other modules that need start other sub-modules.

I also haven't quite reviewed how it should all work e2e and how siloed these utility modules can be. I haven't reviewed the validator bootstrap logic and how that should come into play here. Will think on this more after I have read more on how that module starts.

https://github.com/pokt-network/pocket/compare/0xbigboss/dw-1860/core-deploying-all-the-actors#diff-2cbaa9daa0be26c2069b254103e311e37307bf9835314c19082e5f91e78898c3R69

Olshansk commented 1 year ago

@0xBigBoss I'm not 100% sure if you were looking for feedback yet, so just going to send suggestions based on a high-level brief overview.

engines - overkill IMO; let's keep it simple
enableActorModules - we're only going to have a handful of actors. Even if we introduce new ones, this is not something that will grow very large. I think keeping things verbose (avoiding lists here) will keep it simpler longer-term.

If there's any specific piece that you have questions about, lmk

0xBigBoss commented 1 year ago

@Olshansk I believe I am set with the skeleton configs for the various actors and creating/starting them in the utility module. I did end up keeping one list, actorModules that represents the current actors enabled for the node. There may be a more clever way to not make it a list, but since we are allowing for multiple actor types in some cases, it seemed fitting.

@h5law I did end up going with the approach you recommended for now, and kept a top-level ValidatorConfig. Still not set that this is the best path forward longterm since I still believe this sort of overlaps with the consensus config already. 🤣

@gokutheengineer If you could, please send me some easy instructions on how to add non-validator, full nodes to localnet so I can start integrating these new configs.

In the meantime, I'll start on the RPC calls and the plumbing for that.

https://github.com/pokt-network/pocket/compare/0xbigboss/dw-1860/core-deploying-all-the-actors#diff-2cbaa9daa0be26c2069b254103e311e37307bf9835314c19082e5f91e78898c3R72

Olshansk commented 1 year ago

@Olshansk I believe I am set with the skeleton configs for the various actors and creating/starting them in the utility module. I did end up keeping one list, actorModules that represents the current actors enabled for the node. There may be a more clever way to not make it a list, but since we are allowing for multiple actor types in some cases, it seemed fitting.

👍 Will think about it myself a bit more as well, but definitely a great way to start.

Any chance you can open up a draft PR? It'll be a good way to leave comments & have a discussion along the way.

@h5law I did end up going with the approach you recommended for now, and kept a top-level ValidatorConfig. Still not set that this is the best path forward longterm since I still believe this sort of overlaps with the consensus config already.

I think its reasonable. Not an irreversible decision.

@gokutheengineer If you could, please send me some easy instructions on how to add non-validator, full nodes to localnet so I can start integrating these new configs.

This isn't ready YET. The PRs are in flight so we should hopefully have them ready next week along with the markdown readme.

In the meantime, I'll start on the RPC calls and the plumbing for that.

👍

Olshansk commented 1 year ago

@0xBigBoss Just sharing a screenshot from a conversation we had today rather than summarizing. Stay tuned!

Screenshot 2023-04-28 at 1 16 40 PM

0xBigBoss commented 1 year ago

Very cool @Olshansk . I have started in that direction in my PR #710. Though it's not completely there yet since the fisherman/servicer aren't staked by the cluster manager, I tried to add a hacky method of overriding some of the genesis values.

I think once I have clarity on the full node vs validator configuration, it will be straightforward to bring this home.

bryanchriswhite commented 1 year ago

Just chiming in here on P2P timing and coordination: #505 is a dependency for the P2P module to support communication with non-staked actors (e.g. full-nodes). I've just put #707 up for review which I expect to be part 1 of 2 to close #505.

TLDR (why); Here's an excerpt from the P2P README update in #707:

flowchart TD
    subgraph lMod[Local P2P Module]
        subgraph lHost[Libp2p `Host`]
        end
        subgraph lRT[Raintree Router]
            subgraph lRTPS[Raintree Peerstore]
                lStakedPS([staked actors only])
            end

            subgraph lPM[PeerManager]
            end
            lPM --> lRTPS
        end

        subgraph lBG[Background Router]
            subgraph lBGPS[Background Peerstore]
                lNetPS([all P2P participants])
            end

            subgraph lGossipSub[GossipSub]
            end

            subgraph lDHT[Kademlia DHT]
            end

            lGossipSub --> lBGPS
            lDHT --> lBGPS
        end

        lRT --1a--> lHost
        lBG --1b--> lHost
    end

    subgraph rMod[Remote P2P Module]
        subgraph rHost[Libp2p `Host`]
        end
        subgraph rRT[Raintree Router]
            subgraph rPS[Raintree Peerstore]
                rStakedPS([staked actors only])
            end

            subgraph rPM[PeerManager]
            end

            rPM --> rStakedPS
        end

        subgraph rBG[Background Router]
            subgraph rBGPS[Background Peerstore]
                rNetPS([arr P2P participants])
            end

            subgraph rGossipSub[GossipSub]
            end

            subgraph rDHT[Kademria DHT]
            end

            rGossipSub --> rBGPS
            rDHT --> rBGPS
        end

        rHost -. "setStreamHandler()" .-> hs[[handleStream]]
        hs --3a--> rRT
        hs --3b--> rBG
        rBG  --"4a (cont. propagation)"--> rHost
        linkStyle 11 stroke:#ff3
        rRT  --"4b (cont. propagation)"--> rHost
        linkStyle 12 stroke:#ff3
    end

    lHost --2--> rHost

Olshansk commented 1 year ago

Just chiming in here on P2P timing and coordination: #505 is a dependency for the P2P module to support communication with non-staked actors (e.g. full-nodes). I've just put #707 up for review which I expect to be part 1 of 2 to close #505.

TLDR (why); Here's an excerpt from the P2P README update in #707:

flowchart TD
    subgraph lMod[Local P2P Module]
        subgraph lHost[Libp2p `Host`]
        end
        subgraph lRT[Raintree Router]
            subgraph lRTPS[Raintree Peerstore]
                lStakedPS([staked actors only])
            end

            subgraph lPM[PeerManager]
            end
            lPM --> lRTPS
        end

        subgraph lBG[Background Router]
            subgraph lBGPS[Background Peerstore]
                lNetPS([all P2P participants])
            end

            subgraph lGossipSub[GossipSub]
            end

            subgraph lDHT[Kademlia DHT]
            end

            lGossipSub --> lBGPS
            lDHT --> lBGPS
        end

        lRT --1a--> lHost
        lBG --1b--> lHost
    end

    subgraph rMod[Remote P2P Module]
        subgraph rHost[Libp2p `Host`]
        end
        subgraph rRT[Raintree Router]
            subgraph rPS[Raintree Peerstore]
                rStakedPS([staked actors only])
            end

            subgraph rPM[PeerManager]
            end

            rPM --> rStakedPS
        end

        subgraph rBG[Background Router]
            subgraph rBGPS[Background Peerstore]
                rNetPS([arr P2P participants])
            end

            subgraph rGossipSub[GossipSub]
            end

            subgraph rDHT[Kademria DHT]
            end

            rGossipSub --> rBGPS
            rDHT --> rBGPS
        end

        rHost -. "setStreamHandler()" .-> hs[[handleStream]]
        hs --3a--> rRT
        hs --3b--> rBG
        rBG  --"4a (cont. propagation)"--> rHost
        linkStyle 11 stroke:#ff3
        rRT  --"4b (cont. propagation)"--> rHost
        linkStyle 12 stroke:#ff3
    end

    lHost --2--> rHost

Thanks for the details @bryanchriswhite! Amazing and very clear diagram. There's a lot going on and it's very easy to understand so I want to make sure that doesn't go unnoticed. 🙏

pokt-network / pocket