hoytech / strfry

a nostr relay
GNU General Public License v3.0
448 stars 89 forks source link

Probabilistic filters to reduce memory footprint of ReqWorker for monitoring #43

Open Giszmo opened 1 year ago

Giszmo commented 1 year ago

Clients with big filters can demand a lot of RAM on the relay. This can be reduced significantly by using probabilistic filters. While these filters are not great to query a database, they are great to check individual events against.

I had implemented this idea using Cuckoo filters here. While the REQ is received as normal JSON filter object, this is being converted into a probabilistic filter after EOSE if for example filtering for more than 10 pubkeys. The probabilistic filter is set to a false positive rate of 1/10.000.000 with the client is filtering out those false positives anyway.

hoytech commented 1 year ago

It's a cool idea! I haven't yet heard anyone complaining about memory consumption from large filters, but it could be nobody is looking at that, and/or it isn't a problem yet but could become one in the future (as contact lists get bigger, etc).

I'd also like to see the analysis about the various overheads with different false positive ratios. I did some back of the envelope estimates using bloom filters for a similar user-case but the numbers didn't seem that promising to me at the time.

I think it would be great to put this through the NIP process too, since I expect a lot of people will have interesting insights on this!

Giszmo commented 1 year ago

I thought about nipping it but in the context of network improvement - send a probabilistic filter instead of the full list which turned out to be really not worth it as in nostr such optimisations are not appreciated at this point.

I consider the FP rate acceptable and within spec so I consider it a valid optimization of relays.

As nips are about standardizing for the sake of interoperability, I don't see the point of a nip-probabilisitc-filtering-after-eose. There is really no need for the client to know if you use Bloom or Cuckoo filters or how exactly it would look serialized.