Static sharding fleets for Status Communities

jm-clius commented 10 months ago

Requirements

Two fleets of at least five nodes each:
- status.sharding.store: 5 nodes configured only with relay and store. These will be the store/historical message providers
- status.sharding.bootstrap: 5 nodes configured with relay, filter, lightpush, peer-exchange. These are the main bootstrap nodes and also provide services to resource-restricted nodes

Configuration common to both fleets:

both fleets should use wakuv2.shards as inspiration
all nodes in both fleets should be configured only for the Status Internal CC Community static shards with each topic also set up as a "protected topic" with public key. Set out below:

Pubsub Topics

--pubsub-topic=/waku/2/rs/16/128
--pubsub-topic=/waku/2/rs/16/256

Protected Topics

--protected-topic=/waku/2/rs/16/128:045ced3b90fabf7673c5165f9cc3a038fd2cfeb96946538089c310b5eaa3a611094b27d8216d9ec8110bd0e0e9fa7a7b5a66e86a27954c9d88ebd41d0ab6cfbb91
--protected-topic=/waku/2/rs/16/256:049022b33f7583f34463f5b7622e5da29f99f993e6858a478465c68ee114ccf142204eff285ed922349c4b71b178a2e1a2154b99bcc2d8e91b3994626ffa9f1a6c

no websocket configuration on either fleet for now

Configuration for `status.sharding.bootstrap`

mounted protocols: relay, filter, lightpush, peer-exchange (no store)
own DNS node list deployed with enrtree: URL
nim_waku_dns_disc_url points to own DNS node list
no websocket configuration
pubsub topics and protected topics set up as described above
1 node in this fleet should preferably be set up with trace-level message logs, in order to facilitate future end-to-end message tracing and debugging

Configuration for `status.sharding.store`

mounted protocols: relay, store
own DNS node list deployed with enrtree: URL
nim_waku_dns_disc_url points to status.sharding.bootstrap DNS node list (NB)
store nodes set up with a single, shared PostgreSQL backend

jm-clius commented 10 months ago

@richard-ramos @chaitanyaprem can you check that the guidelines here are clear enough, given our discussion on https://github.com/waku-org/nwaku/issues/1914. With the idea that go-waku drives Status dogfooding, I think the best would be for go-waku to also ensure that this fleet gets set up, so feel free to let me know if there's anything unclear or if you have some other suggestions!

@Ivansete-status do we have a guide/tutorial/example somewhere on how the PostgreSQL configuration is done on nwaku?

Ivansete-status commented 10 months ago

Hey @jm-clius ! We don't have an explicit doc. The next repo has an example of how an nwaku node is connected to a Postgres db. All in a docker compose file --> https://github.com/waku-org/nwaku-compose/blob/master/docker-compose.yml

DB config file: https://github.com/waku-org/nwaku-compose/blob/master/postgres_cfg/postgresql.conf
URL example the nwaku node uses to connect to a Postgres: https://github.com/waku-org/nwaku-compose/blob/176657f4eb9e0a16e9b81b73cada268172667385/run_node.sh#L30

richard-ramos commented 10 months ago

Thank you for these guidelines. I looked at them and they do look clear to me! Just to confirm: status.sharding.store will have DiscV5 enabled to find peers? or will it only rely on the initial connection done to the nodes retorned by status.sharding.bootstrap dns discovery URL?

chaitanyaprem commented 10 months ago

Thanks for the guidelines @jm-clius. So far, these are clear to me!

jm-clius commented 10 months ago

Just to confirm: status.sharding.store will have DiscV5 enabled to find peers?

Yes, I think they should have discv5 enabled.

jakubgs commented 10 months ago

Some questions:

Why is this fleet called sharding and not shards as is the wakuv2 one?
How is this different from the wakuv2.shards fleet?
Is this intended for public use, and if not and it's intended for development, why is it not part of wakuv2?
Do you actually want two fleets or are you talking about two types of nodes within one fleet?
What does "shared PostgreSQL backend mean"?
- Do you mean one database host shared across multiple data centers with different latency?
- Do you mean one database host shared across multiple hosts in one data center?
- How exactly is this "sharing" supposed to work? Will only one node write and rest read?

richard-ramos commented 10 months ago

Do you actually want two fleets or are you talking about two types of nodes within one fleet?

Based on the description i understand it's two separate fleets, each one of them having their own DNS discovery URL and own settings, but interconnected by having status.sharding.store use status.sharding.bootstrap dns discovery url + discv5

richard-ramos commented 10 months ago

Why is this fleet called sharding and not shards as is the wakuv2 one?

I guess the name can be changed to match wakuv2 :thinking: no preference here

jm-clius commented 10 months ago

Answering:

Why is this fleet called sharding and not shards as is the wakuv2 one?

Ah, oversight/assumption. It should be shards. Actually the naming itself is a suggestion from my side, but I think status.shards.store and status.shards.bootstrap make sense.

How is this different from the wakuv2.shards fleet?

wakuv2.shards is indeed the blueprint that the Waku team use to harden most of the concepts we need for these status.shards.* fleets. The main differences are:

for Status I propose two separate fleets, to separate the bootstrapping/service nodes from the store nodes. This implies that store protocol will only be mounted on status.shards.store and light protocols only on status.shards.bootstrap (as set out above)
the shard/pubsub topic subscriptions are different. status.shards.* will serve real Status communities.
status.shards.* will also configure --protected-topic:, which is absent from wakuv2.shards

Is this intended for public use, and if not and it's intended for development

It's intended for public use in future. For now it is intended for Status Community dogfooding and must belong to Status.

Do you actually want two fleets or are you talking about two types of nodes within one fleet?

Two separate fleets with separate DNS node lists. But these fleets are closely associated in that the status.shards.store fleet must connect to the status.shards.bootstrap fleet. They will also both serve the same communities, so from the community's perspective form a unit.

What does "shared PostgreSQL backend mean?

My suggestion for starting point: "one database host shared across multiple data centers with different latency"

Previously (with SQLite) each store node would store the same copy (duplicate) of the entire history locally. With PostgreSQL we have the opportunity to only store fewer copies of history in a shared database, with multiple producers/writers. Deduplication happens on write to the database. This should improve data reliability. We can of course play with redundancy parameters here and have e.g. a postgresql db per data center. I think for dogfooding a single postgresql instance shared between all store nodes is a reasonable starting point. One of the aims of wakuv2.shards is to benchmark this approach more thoroughly. cc @Ivansete-status

jakubgs commented 10 months ago

Thanks for answering. One thing I want to make clear form the start is that there can't be something like status.shards.store. Our fleet names have two elements, env and stage, like so: ${env}.${stage}. For example wakuv2.prod. There is nothing in any of our automations or naming conventions that allows for a 3 segment fleet name.

If You need both words in the name then it would have to be status.shards-store and status.shards-boot.

jakubgs commented 10 months ago

My main worry with a shared DB across different data centers is that due to higher network latency the DB will spend more time than necessary locked during writes, due to delays on various control messages.

jm-clius commented 10 months ago

If You need both words in the name then it would have to be status.shards-store and status.shards-boot.

I'm happy with this naming convention.

My main worry with a shared DB across different data centers

Right! Think this is a good point. I have no real intuition here and this setup would need to be properly dogfooded in any case (we have no postgresql DBs in production yet). Perhaps a better solution here is to e.g. use only two different data centers for now for status.shards-store (3 store nodes in the one, 2 in the other) with each having its own shared postgresql database (2 in total).

TL;DR - let's opt for one postgresql instance per data center.

jm-clius commented 10 months ago

Sidenote: 5 was a number pulled out of a hat. Could of course also do 6 store nodes, which is more easily divisible between 2 or 3 data centers.

jakubgs commented 10 months ago

Okay, last question, what is a deadline for this? I would like to assign this task to @yakimant to get him aquainted with infrastructure work, especially in relation to Waku fleets. But that will naturally mean work will progress slower.

jm-clius commented 10 months ago

I think up to @richard-ramos and @chaitanyaprem in terms of driving Status Community dogfooding here. It's probably possible to do some limited initial dogfooding here using wakuv2.shards instead, which would give us some more time.

Ivansete-status commented 10 months ago

let's opt for one postgresql instance per data center. I like this approach a lot. We will implement the Store synchronization among data centers in the future.

If possible, we should have high-availaility and replication, but I think this can be added in further stages. For now is fine to have one Postgres per datacenter.

Re the shards names, neither status.shards-store or status.shards-boot seem to follow the pattern ${env}.${stage}. What about status-shards-store.prod or status-shards-boot.prod? ( cc @jakubgs @jm-clius )

jakubgs commented 10 months ago

What about status-shards-store.prod or status-shards-boot.prod?

But those are not prod, so why would we call them prod?

Ivansete-status commented 10 months ago

What about status-shards-store.prod or status-shards-boot.prod?

But those are not prod, so why would we call them prod?

I set it as prod because <<It's intended for public use in future>>. In any case, I think we need to add a valid stage label (test or staging or prod) so that we remember in the future the kind of traffic and data the fleet is supporting.

jakubgs commented 10 months ago

I think we shouldn't pretend a non-prod fleet is prod because it might be prod in the future. If we stabilize this setup then we'll probably create a separate new fleet(s) that apply this setup to new fleets.

And honestly then proper naming would be more like status-store.prod and status-boot.prod. Though i still don't fully get why you guys think those should be separate fleets. We already have heterogenous fleets that contains different types of hosts in them. You can see that in infra-eth-cluster which deploys eth.staging or eth.prod, which contains 3 types of hosts: boot, node, mail.

richard-ramos commented 10 months ago

It's probably possible to do some limited initial dogfooding here using wakuv2.shards instead, which would give us some more time.

I see no problems with using that fleet if it can provide topic protection for /waku/2/rs/16/128 and /waku/2/rs/16/256, store, and discv5! early stage dogfooding will be mostly focused on testing the behavior of status-go when sharding is applied to communities

Ivansete-status commented 10 months ago

I agree that we shouldn't name them prod then. I vote for: status-store.test and status-boot.prod

Having that kind of segregation I think is good because:

Clear picture of what a fleet contains
Different fleets might have different update requirements
Different fleets might have different scaling requirements

jakubgs commented 10 months ago

Clear picture of what a fleet contains

That already exists in infra-eth-cluster, see different files in group_vars.

Different fleets might have different update requirements

I don't see how that's relevant. If they work in tandem as Production they are essentially part of status main fleet, just different components of it. Think of it this way. If an application A requires a database D and a cache C, you don't deploy A, B, and C as separate fleets, you deploy them all as one fleet that contains different groups of servers: A, B, and C.

Different fleets might have different scaling requirements

Also irrelevant. You can scale them separate if you want to. See infra-eth-cluster.

jakubgs commented 10 months ago

Correct me if I'm wrong, but it seems to me like you are considering bootstrap and store as separate "fleets" purely because they have different utility, but I think that's wrong. I think those are tied together and are part of the same fleet.

It also doesn't make sense infrastructure config-wise. Since a fleet repo like infra-status configures a single type of fleet, and that fleet can have multiple stages. But those stages are mostly the same in layout and configuration, just different type of usage. Considering the VERY different fleet layout I'd think using a separate fleet repo like infra-status-shards would make more sense, since trying to square the circle of two very different types of fleets being deployed by the same repo will only make both the Terraform and Ansible configuration needlessly complex.

What do you think?

jm-clius commented 10 months ago

I think those are tied together and are part of the same fleet.

Agreed. I think from my perspective it's about whatever makes most sense from an infra perspective in terms of management. Indeed, these two "subfleets" are connected into a single service fleet, that acts as a unit to provide services to Status Communities, will most probably always be running the same version, be interconnected, etc.

The differences are:

separate DNS node lists for store vs bootstrap
separate protocol configurations
store connects to bootstrap, but bootstrap also connects to bootstrap

If these differences can be managed from within a single fleet with different node "types", I think it will indeed be much cleaner!

I'd think using a separate fleet repo

Yeah, I'm viewing this as building on the wakuv2.shards example, but with more complexity. Since we are now introducing different node types, I would agree that it's diverged enough to warrant a separate repo.

yakimant commented 10 months ago

We had a call with @jakubgs, @jm-clius and @Ivansete-status, here are the notes:

Stakeholders

@richard-ramos
@Ivansete-status

Hosts

6x boot, 6x store, 3 store-db, split evenly into 3 DC
1vcpu and 2Gb RAM, same as status / wakuv2 fleets
Create all at once, not only few for testing

Timeline

Project is doable in 1 week (2 weeks max)
Start with boot, than store with sqlite, than store-db, than store connected
Update script to create Cloudflare DNS records (to support node types). Run manually.

DB

All store nodes can write and rely on failed INSERT operation
No configuration tweaks for now, almost vanilla

Code organisation

Fleet name: shards
New repository: infra-shards
Start with infra-template and fill in missing parts from infra-status and infra-wakuv2
Node types (boot, store, store-db) - Ansible groups, similar to eth fleet
Start with test environment (Terraform workspace and Ansible inventory)
Use infra-tf-multi-provider
infra-role-nim-waku for boot and store nodes
Amend infra-role-postgres-ha for store-db nodes, disable replication parameter

Metrics

Both waku and postgres provide some metrics needed for performance monitoring

Later

status fleet will be deprecated in the future. In favour of shards
wakuv2/shards is not a priority, will be deprecated in favour of shards
We can consider the DB improvements in the future "only 1 node can write", replication, proxy/load balancer
Would be great to distinguish genuine DB issue (eg connection) and duplicated INSERT metrics

Ivansete-status commented 10 months ago

Thanks for the notes @yakimant ! A couple of comments:

I don't see a deprecation of the wakuv2.shards in the near future. Just de-prioritize from any maintenance.
If possible, I vote for having convenient fleet names that properly reflect ${env}.${stage}. Maybe we can start now with shards.test? I'm open to any name as long as it contains the stage.

yakimant commented 10 months ago

Yes, fleet name will include stage name too, don't worry!

yakimant commented 9 months ago

@richard-ramos, @jm-clius, quick question: why do we need an enrtree DNS record for storage nodes if it is not referenced? Both bootstrap and storage nodes reference bootstrap node list as I understand.

jm-clius commented 9 months ago

why do we need an enrtree DNS record for storage nodes if it is not referenced? Both bootstrap and storage nodes reference bootstrap node list as I understand.

Correct. I think it is so that Community Nodes can reference the store nodes separately from the bootstrap nodes.

jakubgs commented 9 months ago

Hanno adds:

My understanding from Richard is: they bootstrap against a list of nodes and they create a set of store nodes from a different list (in fact, they hard-code this store list currently, but should do a separate lookup to populate this dynamically). If the store nodes are mixed into the bootstrap list, they will automatically form part of the bootstrap process which we'd like to avoid to create a better separation of interests in the fleet services.

yakimant commented 9 months ago

@jm-clius, @Ivansete-status, Please have a look if you can start using fleet for testing already.

What is missing:

PosrgreSQL metrics scraping and Grafana dashboard
postgresql.conf configuration, default is used

richard-ramos commented 9 months ago

@yakimant what is the dns discovery addresses for this fleet?

richard-ramos commented 9 months ago

nvm. Saw the README.md :)

Ivansete-status commented 9 months ago

I validated that the next hosts have both relay and store properly configured. Amazing work @yakimant !

/dns4/store-01.do-ams3.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT
/dns4/store-02.do-ams3.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm9aDJPkhGxc2SFcEACTFdZ91Q5TJjp76qZEhq9iF59x7R
/dns4/store-01.gc-us-central1-a.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmMELCo218hncCtTvC2Dwbej3rbyHQcR8erXNnKGei7WPZ
/dns4/store-02.gc-us-central1-a.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM
/dns4/store-01.ac-cn-hongkong-c.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm2M7xs7cLPc3jamawkEqbr7cUJX11uvY7LxQ6WFUdUKUT
/dns4/store-02.ac-cn-hongkong-c.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P

To test it, I followed the next steps:

Start a nwaku node locally, setting staticnode and storenode to the first node from previous list. With that, I established the remote relay and store peer, respectively.

i. Then, I made my local node to publish a few messages by sending json-rpc requests to my local node.

ii. Then, I made a store request to my local node, through json-rpc, and in turn my local node made a Store request to the remote node. Then, the json-rpc command returned the messages stored by the *F1EfT peer.
I repeated the step 1.ii. with the rest of the nodes. In all cases, the stored messages where properly received. Indicating that all the nodes are relay-connected among them, and also, all properly handle Store requests.

Notes:

I run the wakunode2 with the next command: ./build/wakunode2 --config-file=cfg_node_a.txt, and I've been editing the config file to follow the above steps. cfg_node_a.txt
This is the command I used to publish messages: curl -d "{\"jsonrpc\":\"2.0\",\"id\":"$(date +%s%N)",\"method\":\"post_waku_v2_relay_v1_message\", \"params\":[\"/waku/2/rs/16/64\", {\"contentTopic\": \"jamon\", \"timestamp\":"$(date +%s%N)", \"payload\":\"cmFuZG9tCg==\"}]}" --header "Content-Type: application/json" http://localhost:8546
This is the command I used to make my local node retrieve the stored messages: curl -d '{"jsonrpc":"2.0","id":"id","method":"get_waku_v2_store_v1_messages"}' --header "Content-Type: application/json" http://localhost:8546

cc: @jm-clius

yakimant commented 9 months ago

I can see messages in db:

nim-waku=> select id from messages;
                                id
------------------------------------------------------------------
 2be984aa463c976553b7ab089c6708fa97509449f7359693ae98665fecb1a652
 2be984aa463c976553b7ab089c6708fa97509449f7359693ae98665fecb1a652
 2be984aa463c976553b7ab089c6708fa97509449f7359693ae98665fecb1a652
 f0f81dd5f875ebac8402617ac0c3cd0e7b0c72de4e23cacd23fc7362aacd8b77
(4 rows)

jm-clius commented 9 months ago

Great, I think further dogfooding on this fleet can continue. @richard-ramos

yakimant commented 9 months ago

@jm-clius shall we close the issue or wait for testing results?

jm-clius commented 8 months ago

@yakimant I think we can close this issue and raise any further problems/issues separately. The bulk of the work here has been done. Thanks for your excellent work here!

status-im / infra-shards