status-im / status-go

The Status module that consumes go-ethereum
https://status.im
Mozilla Public License 2.0
728 stars 245 forks source link

bug: optimize filter registrations to avoid some issues noticed #5262

Closed chaitanyaprem closed 2 months ago

chaitanyaprem commented 3 months ago

Problem

While dogfooding https://github.com/status-im/status-go/pull/4665/ with status-desktop, i had noticed few issues that need to be addressed.

  1. Sometimes, filters are registered before there is any peer connectivity which causes them to fail. Since error is not propagated all the way up the stack from where these are installed, there is no retry and app gets stuck in a weird state. Only workaround is to restart the app to regain connectivity.
  2. Few times i had noticed that app loading takes a lot of time. This is because if the user is a member of a community, then a lot of filters would be installed which are done in sequence. Since this involves network calls to 2 peers per filter, the loading takes almost few minutes which is a very bad user experience.

Implementation

Following approach would solve both the problems and also address the issue https://github.com/status-im/status-go/issues/4526 Solution is as below:

  1. Do filter registrations async i.e filter-manager to accept any filter install and queue them.
  2. Batch the filter subscriptions and group them(subscriptions in a pubsubTopic) into a max of 100 contentTopics to reduce number of actual filter subscriptions. This would reduce number of subscriptions. This would then have a side-affect that when a filter is uninstalled, we need to remove that specific subscription(To analyze).
  3. Note to also have a small batch timer so that subscriptions are not queued for too long.
  4. If no peers are available when a subscription is requested, keep it queue and proceed only if there are filter peers available in peerStore (Note we need to consider peers supporting a specific pubsubTopic for this)

Acceptance Criteria

Need to validate all Filter test cases along with dogfooding lightClient.

cc @richard-ramos @vitvly @cammellos : Do review the above and provide any feedback/suggestions.

@weboko @danisharora099 : This might be something that is useful in js-waku Filter-SDK as well.

danisharora099 commented 3 months ago

Oh, this is interesting for js-waku indeed. Thanks @chaitanyaprem for documenting this.

I noticed something similar when working on introducing peer management for LightPush, but this is directly applicable for Filter as well: I was thinking of introducing automated retries into the SDK where the SDK .send() would attempt at finding a peer for the request, if doesn't exist already, and then send the peer. In a case where, even upon a forced find of peers, no peers are found, we will simply return with an error code communicating that no peers were found.

I'm curious: in your approach where you queue these requests, do you block the initial library consumer's function invocation, of for example the .subscribe()? Does it wait to be resolved?

cc @weboko

chaitanyaprem commented 3 months ago

I'm curious: in your approach where you queue these requests, do you block the initial library consumer's function invocation, of for example the .subscribe()? Does it wait to be resolved?

I did not think of reporting failure back to caller as of now. But rather i am expecting SDK to keep retrying internally. But yeah, this is something that needs to be thought of. Maybe some sort of a callback can be invoked in case such things fail even after retry. At this point i am not sure, because there are factors such as what if the node itself lost connectivity and hence subscribe keeps failing.

chaitanyaprem commented 3 months ago

Filter subscription async and queuing subscriptions when no connectivity is there(Points 1 & 4 above), are implemented as part of https://github.com/status-im/status-go/pull/4665/commits/817e959ee9b7514bbc01f205addceabc07569cff

Others will be taken up in a separate PR.

chaitanyaprem commented 3 months ago

Batching of subscriptions may not be required unless we start seeing an issue. Because this only impacts the lightClient that has to maintain and manage these subscriptions. To a serviceNode, it doesn't matter how many subscriptions a peer has it still considers them as a single one.

The only possible challenge i see at this point with not batching them could be in case the serviceNode starts rate-limiting filter-subscribe requests which would then start failing.

chaitanyaprem commented 2 months ago

This is being taken up now since it is causing delay and bloat of subscriptions. And https://github.com/status-im/status-go/issues/5421 will take time to implement.

chaitanyaprem commented 2 months ago

Closing as completed as part of #5440