Open fbarbu15 opened 8 months ago
1st: content topics resolves to a very big shard number if pubsub-topic is not present in the docker flags
You are correct that resolving to /waku/2/rs/2/58355
is wrong, but note that /waku/2/rs/2/3
is not the expected shard. What makes you think that?
The configuration you provide lacks pubsub-topic
, so how does the node know the available shards? If the node doesn't know the available topics, then it doesn't know the amount of shards, so its not possible to resolve it properly. How many shards there are? 1? 10?.
I suggest erroring if pubsub-topic
is not configured. Note that in known clusters (eg 1
) these are loaded automatically.
2nd: content topics resolves any content topics to 0 if pubsub-topic is present in the docker flags
Indeed a problem. Wondering though if it makes sense to configure just 1 shard that is --pubsub-topic=/waku/2/rs/2/2
instead of --pubsub-topic=/waku/2/rs/2/0
. This can end up being a bit confusing, since if we allow that, then [/waku/2/rs/2/12
, /waku/2/rs/2/2334
, /waku/2/rs/2/99999
] is technically allowed and can be messy. So I would suggest enforcing that the topic starts with 0
up to num_shards-1
.
In summary:
pubsub-topic
are provided (unless its a known cluster)/waku/2/rs/cluster-id/0
... /waku/2/rs/cluster-id/num_shards-1
thoughts @jm-clius @SionoiS
/waku/2/rs/2/3
go-waku
and js-waku
are resolving it to shard 3
I've raised this issue in the context of interop testing between nwaku and go-waku/js-waku
So if we change the resolving logic in one place we need to change in all 3 places
I suggest erroring if
pubsub-topic
is not configured
I might be wrong but also in the context of static and auto sharding I was thinking that:
pubsub-topic
then we are using/testing static-shardingpubsub-topic
but with content-topic
then we are using/testing auto-shardingUrgh. This is messy in our config and I can understand the confusion. My feeling:
pubsub-topic
configured, there should be no content-topic
configuration (i.e. static subscription, no autosharding)content-topic
can be configured but there MUST be shards
configuration starting at 0
up to the configured shards
num.Based on the discussion, we can define the following configurations:
--pubsub-topic=/waku/2/rs/2/0
- Valid--pubsub-topic=/waku/2/rs/2/1
- Invalid (starts with shard 1)--pubsub-topic=/waku/2/rs/2/0 --pubsub-topic=/waku/2/rs/2/1
- Valid--pubsub-topic=/waku/2/rs/2/0 --pubsub-topic=/waku/2/rs/2/2
- Invalid (missing shard 1)--pubsub-topic=/waku/2/rs/2/0 --content-topic=/myapp/1/latest/proto
- Invalid (mix of pubsub-topic
and content-topic
)--content-topic=/myapp/1/latest/proto
- Valid (resolves to shard 0)--content-topic=/waku/2/content/test.js
- Invalid (resolves to shard 1)--content-topic=/myapp/1/latest/proto --content-topic=/waku/2/content/test.js
- Valid--content-topic=/myapp/1/latest/proto --content-topic=/app/22/sometopic/someencoding
- Invalid (missing shard 1)These topic resolutions are based on what I saw implemented in go-waku
and js-waku
, probably aligned with the RFC 51:
/myapp/1/latest/proto
resolves to shard 0
/waku/2/content/test.js
resolves to shard 1
/app/22/sometopic/someencoding
resolves to shard 2
Please lmk if this is how it should work
The problem we face is because the network specs. for cluster id != 1 are undefined.
Couple points;
--pubsub-topic=<static shard>
should always be valid, as per the RFC.Maybe adding a --total-shards=X
as config so that autosharding can be configured? If not present and content topics are specified then error (except for TWN)?
The reasoning behind autosharding for cluster != 1 was to make it easier to debug and test. If this is not the case then what are we doing?
It does make the testing easier. I'm just trying to figure out which are the valid scenarios and what is invalid so I can write the interop tests accordingly
It does make the testing easier. I'm just trying to figure out which are the valid scenarios and what is invalid so I can write the interop tests accordingly
I feel like we should not test cluster != 1 since it's undefined behavior (in the context of autosharding) that would simplify thing greatly no? Since cluster != 1 is for testing purpose only, should it really be tested?
Also when you say valid
or invalid
in my mind it's always according to the spec. (spec say cluster != 1 undefined)
We need a simpler way to config Nwaku. Seams like this should be simple but it's not and makes testing difficult.
Can you share that spec please, maybe I'm missing something
We started testing autosharding on cluster != 1 mainly because on cluster 1 RLN is enabled by default and the testing frameworks are not configured for relay yet and even if they were, I feel the rate limiting will be a problem for tests where we send lots of messages in parallel
Also as discussed with @alrevuelta in this thread, autosharding on other clusters should be possible not only for testing purposes,
Can you share that spec please, maybe I'm missing something
TWN -> https://rfc.vac.dev/spec/64/#network-shards and autosharding -> https://rfc.vac.dev/spec/51/#automatic-sharding
We can make changes if some part is unclear.
We started testing autosharding on cluster != 1 mainly because on cluster 1 RLN is enabled by default and the testing frameworks are not configured for relay yet and even if they were, I feel the rate limiting will be a problem for tests where we send lots of messages in parallel
I see
Also as discussed with @alrevuelta in this thread, autosharding on other clusters should be possible not only for testing purposes,
Yes it is possible. Maybe we could define autosharding on cluster != 1 as having shards 0-1023 just like static shards? https://rfc.vac.dev/spec/51/#static-sharding
Revisiting this issue. I know this contradicts a bit what I said earlier, but @SionoiS makes some good points:
In short, my suggestion:
9
mirrors autosharding on TWN 1
exactly (and we roll out new shards in new generations in parallel to both 1
and 9
).1
or 9
WDYT of this suggestion? cc @fbarbu15 @SionoiS @alrevuelta @richard-ramos
Thanks @jm-clius, that makes sense for me and it would simplify testing for autosharding.
define new network for a test network without RLN that is symmetrical to TWN. I suggest cluster ID = 9, shards=0-7 (on the assumption that we want to keep cluster IDs 2-7 for RLN-controlled testnets and cluster ID 8 to eventually mirror whatever we deploy to cluster ID 0 in the future.)
Unsure what you mean by this. A network is defined by its nodes. You can already do this by running nodes like this and connecting them together.
./build/wakunode2\
--relay=true\
--pubsub-topic=/waku/2/rs/9/0\
--pubsub-topic=/waku/2/rs/9/1\
--pubsub-topic=/waku/2/rs/9/2\
--pubsub-topic=/waku/2/rs/9/3\
--pubsub-topic=/waku/2/rs/9/4\
--pubsub-topic=/waku/2/rs/9/5\
--pubsub-topic=/waku/2/rs/9/6\
--pubsub-topic=/waku/2/rs/9/7\
--cluster-id=9
autosharding on this test network 9 mirrors autosharding on TWN 1 exactly (and we roll out new shards in new generations in parallel to both 1 and 9).
Sure. Guess we would need a network flag for this (auto vs static). But tbh, if status is not using static sharding, why having auto vs static? Just one type of sharding and that's it (=auto).
A network is defined by its nodes.
Only, if we assume that a node subscribe to all shards. In your example the network used has cluster id 9 BUT we can only know that it has at least 8 shards and without the exact number of shards, autosharding cannot be used.
In short, my suggestion: .....
Yes, by not allowing autosharding for every cluster we don't have to deal with the edge cases just yet. 👍
A network is defined by its nodes
Within the context of a configuration, RLN membership set and autosharding, the "network" definition is a combination of cluster ID and shards (with implied generation, membership set, rate limit and contract address). We have a single RLN membership for the network defined by cluster ID 1. If I have a node participating in the Waku Network and this node is encapsulated in an application that decides to use autosharding, any content topic the node publishes to should map according to a hash function that is "aware" of the 8 shards that define Gen 0 of the network. This should be true, whether the node is subscribed to all 8 pubsub topics or not. In fact, the subscriptions of the node should not affect the hashing of content topics to pubsub topics at all (by specification).
why having auto vs static?
I don't think strictly-speaking Waku has autosharding - it's just a convenience API on top of a Waku network. We only have "static" sharding on the Waku protocol level. Autosharding provides a convenient application level API that automatically populates the shard/pubsub topic based on content topic. For that it needs to know what the hash space looks like, which depends on a cluster + num_of_shards (and generation) definition. We've only defined TWN 8 shards so far. My proposal is to define one more such network for testing purposes. I don't necessarily think we'd need to add configuration - autosharding is an implied use case if the application uses an API without specifying a pubsub topic.
A network is defined by its nodes.
Only, if we assume that a node subscribe to all shards. In your example the network used has cluster id 9 BUT we can only know that it has at least 8 shards and without the exact number of shards, autosharding cannot be used.
imho that's not how it works. pubsubTopics
define all available ones:
This subscribes to all because thats the default
./build/wakunode2\
--relay=true\
--pubsub-topic=/waku/2/rs/9/0\
--pubsub-topic=/waku/2/rs/9/1\
--pubsub-topic=/waku/2/rs/9/2\
--pubsub-topic=/waku/2/rs/9/3\
--pubsub-topic=/waku/2/rs/9/4\
--pubsub-topic=/waku/2/rs/9/5\
--pubsub-topic=/waku/2/rs/9/6\
--pubsub-topic=/waku/2/rs/9/7\
--cluster-id=9
But this, only subscribes to 0
, 1
.
./build/wakunode2\
--relay=true\
--pubsub-topic=/waku/2/rs/9/0\
--pubsub-topic=/waku/2/rs/9/1\
--pubsub-topic=/waku/2/rs/9/2\
--pubsub-topic=/waku/2/rs/9/3\
--pubsub-topic=/waku/2/rs/9/4\
--pubsub-topic=/waku/2/rs/9/5\
--pubsub-topic=/waku/2/rs/9/6\
--pubsub-topic=/waku/2/rs/9/7\
--cluster-id=9
--shard=0 --shard=1
imho that's not how it works.
pubsubTopics
define all available ones:
Well I'm even more confused now. I though --pubsub-topic=
were to be deprecated. We can't if they are used this way. :thinking:
Well I'm even more confused now. I though --pubsub-topic= were to be deprecated. We can't if they are used this way. 🤔
This is not the final solution, but a node should know the available shards of the network. Either directly (topics) or implicitly (like networkAmountShards
).
I would deprecate it, but I'm afraid that will break some things that by now I'm unaware how to deal with.
a node should know the available shards of the network
Right. I notice that this is used to initialize the sharding. The only place this number of shards is used seems to be for autosharding. In retrospect, this seems to me to be the wrong approach, especially because we also subscribe to all --pubsub-topic
topics. The number of shards per network gen is a specification and not a configuration. It should be decoupled from subscriptions. I would have this number hard-coded for now - anyone using the autosharding API should expect their content topics to be hashed deterministically to shards 0-7 for this generation of the network. Furthermore, we should have only one way to configure topic subscription:
--cluster-id:
and a repeated --shard:
Everything else should be deprecated, IMO.
I think there are 2 issues here" 1st: content topics resolves to a very big shard number if pubsub-topic is not present in the docker flags
EX:
docker run -i -t -p 34696:34696 -p 34697:34697 -p 34698:34698 -p 34699:34699 -p 34700:34700 harbor.status.im/wakuorg/nwaku:latest --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=34698 --rest-port=34696 --tcp-port=34697 --discv5-udp-port=34699 --rest-address=0.0.0.0 --nat=extip:172.18.139.12 --peer-exchange=true --discv5-discovery=true --cluster-id=2 --content-topic=/toychat/2/huilong/proto --relay=true --filter=true
Will resolve to
/waku/2/rs/2/58355
While it should resolve to
/waku/2/rs/2/3
2nd: content topics resolves any content topics to 0 if pubsub-topic is present in the docker flags
EX:
docker run -i -t -p 34696:34696 -p 34697:34697 -p 34698:34698 -p 34699:34699 -p 34700:34700 harbor.status.im/wakuorg/nwaku:latest --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=34698 --rest-port=34696 --tcp-port=34697 --discv5-udp-port=34699 --rest-address=0.0.0.0 --nat=extip:172.18.139.12 --peer-exchange=true --discv5-discovery=true --cluster-id=2 --pubsub-topic=/waku/2/rs/2/2 --content-topic=/toychat/2/huilong/proto --content-topic=/statusim/1/community/cbor --content-topic=/waku/2/content/test.js --relay=true --filter=true
I can see in the logs
While /toychat/2/huilong/proto should resolve to shard 3, /statusim/1/community/cbor to 4 and /waku/2/content/test.js to 1