Open chaitanyaprem opened 7 months ago
@chaitanyaprem While I do agree on subscription management should happen with time-to-live tracking, meaning we expect clients to use filter ping to renew their subscriptions TTL, I would argue with the second criteria. I would not mix peer management with filter subscriptions at all. So if a peer connection is accepted than wee should be able to maintain it's subscriptions. On the other hand current, existing solution in nwaku use a 1 min period to check active peers and maintain subscriptions db. If that is not working properly that's a bug, I will check it.
@chaitanyaprem While I do agree on subscription management should happen with time-to-live tracking, meaning we expect clients to use filter ping to renew their subscriptions TTL, I would argue with the second criteria. I would not mix peer management with filter subscriptions at all. So if a peer connection is accepted than wee should be able to maintain it's subscriptions. On the other hand current, existing solution in nwaku use a 1 min period to check active peers and maintain subscriptions db. If that is not working properly that's a bug, I will check it.
I guess if point-1 is handled, point-2 would get addressed automatically. Once you start to have TTL tracking on subscriptions, filter client connections that are disconnected by peer-manager would get removed automatically by this expiry.
But I am wondering if this could lead to a loophole which can be exploited by someone. e.g : If i issue constantly filter subscribe requests in a loop from same IP by using different peerIDs, the service-node would get a flood of connections and filter requests. Peer-manager would disconnect all connections, but filter protocol would still be flooded with subscriptions which are invalid and stay there until the TTL kicks-in.
Maybe this would get addressed when we implement service rate-limits on nodes as discussed https://github.com/waku-org/go-waku/issues/667. We can probably ignore point-2 for now and track it as part of the issue referenced.
@chaitanyaprem While I do agree on subscription management should happen with time-to-live tracking, meaning we expect clients to use filter ping to renew their subscriptions TTL, I would argue with the second criteria. I would not mix peer management with filter subscriptions at all. So if a peer connection is accepted than wee should be able to maintain it's subscriptions. On the other hand current, existing solution in nwaku use a 1 min period to check active peers and maintain subscriptions db. If that is not working properly that's a bug, I will check it.
I guess if point-1 is handled, point-2 would get addressed automatically. Once you start to have TTL tracking on subscriptions, filter client connections that are disconnected by peer-manager would get removed automatically by this expiry.
Yes, indeed. Its already expected to behave like that. At least nim-waku checks connected peers per minute and cleans up subscriptions of not connected peers.
But I am wondering if this could lead to a loophole which can be exploited by someone. e.g : If i issue constantly filter subscribe requests in a loop from same IP by using different peerIDs, the service-node would get a flood of connections and filter requests. Peer-manager would disconnect all connections, but filter protocol would still be flooded with subscriptions which are invalid and stay there until the TTL kicks-in.
I don't precisely can imagine that situation due for being able to use the filter-service you must be a connected peer already. You cannot just subscribe from random peers without having been connected... if I may misunderstand something please clarify.
Maybe this would get addressed when we implement service rate-limits on nodes as discussed waku-org/go-waku#667. We can probably ignore point-2 for now and track it as part of the issue referenced. Otherwise I'm not pretty sure how to rate limit such scenario as that you describe as an attack always uses new peerID. If we would limit per IP, we may run into trouble handling light nodes behind NAT or VPN. Of course in normal operation we do not expect high number of subscription requests per minute, so maybe we can easily detect such a misuse and ban or punish such IPs for a while.
In a more traditional service mesh style you would put a load balancer in front. I'm not pretty sure if we should not think of such a network gateways to protect such service nodes that are operating on top of the gossipsub.
Weekly Update
I don't precisely can imagine that situation due for being able to use the filter-service you must be a connected peer already. You cannot just subscribe from random peers without having been connected... if I may misunderstand something please clarify.
When i was testing locally to reproduce another issue noticed in status app, i had observed this behaviour.
I was running an nwaku node locally with relay+filter enabled. Then i used a simple go-waku client to do the following in a loop:
This scenario left my nwaku node with 100 filter subscriptions.
But now i take a relook at the scenario, it looks like my go-waku client was disconnecting connection for each peer after subscribing. So, probably it was not hitting this issue, rather just connections being left idle.
This has to be rerun by not disconnecting the peer and confirm what is the behvaiour.
I don't precisely can imagine that situation due for being able to use the filter-service you must be a connected peer already. You cannot just subscribe from random peers without having been connected... if I may misunderstand something please clarify.
When i was testing locally to reproduce another issue noticed in status app, i had observed this behaviour.
I was running an nwaku node locally with relay+filter enabled. Then i used a simple go-waku client to do the following in a loop:
- Use a new peerID and connect to nwaku node
- Subscribe to filter and not unsubscribe and disconnect
This scenario left my nwaku node with 100 filter subscriptions.
But now i take a relook at the scenario, it looks like my go-waku client was disconnecting connection for each peer after subscribing. So, probably it was not hitting this issue, rather just connections being left idle.
This has to be rerun by not disconnecting the peer and confirm what is the behavior.
Ok, I see the concept. I do think using TTL on subscriptions will help to extend the management of left over subs. Current PeerManager is configured default to let 5 peers from behind one IP. If one disconnects, its subscriptions are wiped out. Still it is the client peer responsibility to maintain its subs. With TTL it will be explicitly required to use filter v2 ping to restart TTL thus keep subs working.
Weekly Update
@NagyZoltanPeter Is there anything left from this issue. AFAIU, the TTL subscription management and increase in contentTopics per subscribe request took care of everything.
Also, we have done some changes in go-waku to make sure same peer is not used again for same subscription.
Problem
As per discussion here there needs to be 2 things that can be improved wrt managing filter subscriptions. Noticed the behaviour while debugging for filter delay errors and observations of lot of idle subscriptions on status sharding fleet nodes. Noticed while simulating some other test behaviour.
Suggested solution
Alternatives considered
None
Additional context
https://discord.com/channels/1110799176264056863/1184634939220709468 can be referred
Acceptance criteria
QA Test Scenario Description
Scenario 1
Scenario 2