waku-org / nwaku

Waku node and protocol.
Other
188 stars 49 forks source link

feat: Managing idle filter subscriptions #2293

Open chaitanyaprem opened 7 months ago

chaitanyaprem commented 7 months ago

Problem

As per discussion here there needs to be 2 things that can be improved wrt managing filter subscriptions. Noticed the behaviour while debugging for filter delay errors and observations of lot of idle subscriptions on status sharding fleet nodes. Noticed while simulating some other test behaviour.

Suggested solution

  1. Filter subscriptions that are idle (i.e subscription exists but peer not reachable even after some amount of time - maybe 5/15/30 mins or even lower) can be removed as it is expected from lightNode to resubscribe to filter when it comes back online. Suggestions wrt time to be considered for a subscription to be idle - 5 mins, 15 mins or 30 mins
  2. When subscriptions are created by using random peerids but from same IP address, looks like subscriptions still remain eventhough peer-manager kicks in and disconnects peers beyond colocation limit

Alternatives considered

None

Additional context

https://discord.com/channels/1110799176264056863/1184634939220709468 can be referred

Acceptance criteria

QA Test Scenario Description

Scenario 1

  1. Scenario 2

NagyZoltanPeter commented 7 months ago

@chaitanyaprem While I do agree on subscription management should happen with time-to-live tracking, meaning we expect clients to use filter ping to renew their subscriptions TTL, I would argue with the second criteria. I would not mix peer management with filter subscriptions at all. So if a peer connection is accepted than wee should be able to maintain it's subscriptions. On the other hand current, existing solution in nwaku use a 1 min period to check active peers and maintain subscriptions db. If that is not working properly that's a bug, I will check it.

chaitanyaprem commented 6 months ago

@chaitanyaprem While I do agree on subscription management should happen with time-to-live tracking, meaning we expect clients to use filter ping to renew their subscriptions TTL, I would argue with the second criteria. I would not mix peer management with filter subscriptions at all. So if a peer connection is accepted than wee should be able to maintain it's subscriptions. On the other hand current, existing solution in nwaku use a 1 min period to check active peers and maintain subscriptions db. If that is not working properly that's a bug, I will check it.

I guess if point-1 is handled, point-2 would get addressed automatically. Once you start to have TTL tracking on subscriptions, filter client connections that are disconnected by peer-manager would get removed automatically by this expiry.

But I am wondering if this could lead to a loophole which can be exploited by someone. e.g : If i issue constantly filter subscribe requests in a loop from same IP by using different peerIDs, the service-node would get a flood of connections and filter requests. Peer-manager would disconnect all connections, but filter protocol would still be flooded with subscriptions which are invalid and stay there until the TTL kicks-in.

Maybe this would get addressed when we implement service rate-limits on nodes as discussed https://github.com/waku-org/go-waku/issues/667. We can probably ignore point-2 for now and track it as part of the issue referenced.

NagyZoltanPeter commented 6 months ago

@chaitanyaprem While I do agree on subscription management should happen with time-to-live tracking, meaning we expect clients to use filter ping to renew their subscriptions TTL, I would argue with the second criteria. I would not mix peer management with filter subscriptions at all. So if a peer connection is accepted than wee should be able to maintain it's subscriptions. On the other hand current, existing solution in nwaku use a 1 min period to check active peers and maintain subscriptions db. If that is not working properly that's a bug, I will check it.

I guess if point-1 is handled, point-2 would get addressed automatically. Once you start to have TTL tracking on subscriptions, filter client connections that are disconnected by peer-manager would get removed automatically by this expiry.

Yes, indeed. Its already expected to behave like that. At least nim-waku checks connected peers per minute and cleans up subscriptions of not connected peers.

But I am wondering if this could lead to a loophole which can be exploited by someone. e.g : If i issue constantly filter subscribe requests in a loop from same IP by using different peerIDs, the service-node would get a flood of connections and filter requests. Peer-manager would disconnect all connections, but filter protocol would still be flooded with subscriptions which are invalid and stay there until the TTL kicks-in.

I don't precisely can imagine that situation due for being able to use the filter-service you must be a connected peer already. You cannot just subscribe from random peers without having been connected... if I may misunderstand something please clarify.

Maybe this would get addressed when we implement service rate-limits on nodes as discussed waku-org/go-waku#667. We can probably ignore point-2 for now and track it as part of the issue referenced. Otherwise I'm not pretty sure how to rate limit such scenario as that you describe as an attack always uses new peerID. If we would limit per IP, we may run into trouble handling light nodes behind NAT or VPN. Of course in normal operation we do not expect high number of subscription requests per minute, so maybe we can easily detect such a misuse and ban or punish such IPs for a while.

In a more traditional service mesh style you would put a load balancer in front. I'm not pretty sure if we should not think of such a network gateways to protect such service nodes that are operating on top of the gossipsub.

NagyZoltanPeter commented 6 months ago

Weekly Update

chaitanyaprem commented 6 months ago

I don't precisely can imagine that situation due for being able to use the filter-service you must be a connected peer already. You cannot just subscribe from random peers without having been connected... if I may misunderstand something please clarify.

When i was testing locally to reproduce another issue noticed in status app, i had observed this behaviour.

I was running an nwaku node locally with relay+filter enabled. Then i used a simple go-waku client to do the following in a loop:

  1. Use a new peerID and connect to nwaku node
  2. Subscribe to filter and not unsubscribe and disconnect

This scenario left my nwaku node with 100 filter subscriptions.

But now i take a relook at the scenario, it looks like my go-waku client was disconnecting connection for each peer after subscribing. So, probably it was not hitting this issue, rather just connections being left idle.

This has to be rerun by not disconnecting the peer and confirm what is the behvaiour.

NagyZoltanPeter commented 6 months ago

I don't precisely can imagine that situation due for being able to use the filter-service you must be a connected peer already. You cannot just subscribe from random peers without having been connected... if I may misunderstand something please clarify.

When i was testing locally to reproduce another issue noticed in status app, i had observed this behaviour.

I was running an nwaku node locally with relay+filter enabled. Then i used a simple go-waku client to do the following in a loop:

  1. Use a new peerID and connect to nwaku node
  2. Subscribe to filter and not unsubscribe and disconnect

This scenario left my nwaku node with 100 filter subscriptions.

But now i take a relook at the scenario, it looks like my go-waku client was disconnecting connection for each peer after subscribing. So, probably it was not hitting this issue, rather just connections being left idle.

This has to be rerun by not disconnecting the peer and confirm what is the behavior.

Ok, I see the concept. I do think using TTL on subscriptions will help to extend the management of left over subs. Current PeerManager is configured default to let 5 peers from behind one IP. If one disconnects, its subscriptions are wiped out. Still it is the client peer responsibility to maintain its subs. With TTL it will be explicitly required to use filter v2 ping to restart TTL thus keep subs working.

NagyZoltanPeter commented 6 months ago

Weekly Update

chaitanyaprem commented 4 weeks ago

@NagyZoltanPeter Is there anything left from this issue. AFAIU, the TTL subscription management and increase in contentTopics per subscribe request took care of everything.

Also, we have done some changes in go-waku to make sure same peer is not used again for same subscription.