Open fiatjaf opened 2 weeks ago
Storing reactions is not a big issue: dedicated indexes and encoding schemes can be made for high-volume optimizable events, which will mean you can store about 5M reactions in 1GB. This does not include some other methods you could use such as public key lookup tables that can cut the size even further.
Relays may also implement sampling and adjust the HLL result accordingly (as it is likely that the relay will know a rough count of the amount of events it may have to explore): you don't need to add every event to the HLL sketch, only some, and add a correction factor to every register.
dedicated indexes and encoding schemes can be made for high-volume optimizable events
HLL is exactly such a thing already and defining what goes into this dedicated scheme and what is ignored is the question I posed above: if you want a limited functionality for specific use cases then HLL caching can be very good, otherwise nothing is possible.
Anyway, I think one solution is to define HLL to be returned ONLY in the following queries (exact templates), at least for now:
{"#e":["<anchor>"],"kinds":[7]}
{"#p":["<anchor>"],"kinds":[3]}
All other queries should not return HLL responses.
And then whenever someone has another use case we add it to the list.
In the queries above it is declared that <anchor>
will be used to determine how to produce the HLL value for each related event deterministically.
Anyway, I think one solution is to define HLL to be returned ONLY in the following queries (exact templates), at least for now:
* `{"#e":["<anchor>"],"kinds":[7]}` * `{"#p":["<anchor>"],"kinds":[3]}`
All other queries should not return HLL responses.
And then whenever someone has another use case we add it to the list.
In the queries above it is declared that
<anchor>
will be used to determine how to produce the HLL value for each related event deterministically.
You have not solved the problem that this is open to manipulation
otherwise nothing is possible.
A lot of things are possible.
Here's a nice colorful video explanation of HyperLogLog: https://www.youtube.com/watch?v=lJYufx0bfpw And here's a very interesting article with explanations, graphs and other stuff: http://antirez.com/news/75
If relays implement this we can finally get follower counts that do not suck and without having to use a single relay (aka relay.nostr.band) as the global source of truth for the entire network -- at the same time as we save the world by consuming an incomparably small fraction of the bandwidth.
Even if one was to download 2 reaction events in order to display a silly reaction count number in a UI that would already be using more bytes than this HLL value does (actually considering deflate compression the COUNT response with the HLL value is already smaller than a single reaction EVENT response).
This requires trusting relays to not lie about the counts and the HLL values, but this NIP always required that anyway, so no change there.
HyperLogLog can be implement in multiple ways, with different parameters and whatnot. Luckily most of the customizations (for example, the differences between HyperLogLog++ and HyperLogLog) can be applied at the final step, so it is a client choice. This NIP only describes the part that is needed for interoperability, which is how relays should compute the values and then return them to clients.
Because implementations would have to agree on parameters such as the number of registers to use, this NIP also fixes that number in 256 for simplicity's sake (makes it simpler implement since it's the maximum value of one byte) and also because it is a reasonable amount.
These are some random estimations I did, to showcase how efficient those 256 bytes can be:
As you can see they are almost perfect for small counts, but still pretty good for giant counts.