Closed ichub closed 1 year ago
Hi @ichub, the addMember
function calculates N = depth
hashes for each insertion. If depth = 16
and numberOfMembers = 500
, the addMembers
function needs to calculate 16*500 hashes. It takes 1.4s in my laptop, but it might actually be much more on other environments, so it's not ideal for client-side apps. I can't think of ways to make it more efficient, unless we change data structure.
In Bandada we also store the members in a server DB, but when we run the app we create cached JS Groups based on stored members and we keep them synced as new members are added, so that it can provide an API to generate proofs of membership calculated on server-side. Clients can then use that API rather than download the list of members and calculate the proof of membership themselves.
Hi @ichub, the addMember function calculates N = depth hashes for each insertion. If depth = 16 and numberOfMembers = 500, the addMembers function needs to calculate 16*500 hashes. It takes 1.4s in my laptop, but it might actually be much more on other environments, so it's not ideal for client-side apps.
Makes sense.
I can't think of ways to make it more efficient, unless we change data structure.
My suggestion is to change the data structure to allow you to instantiate a group from an array of members, rather than by adding members one by one. I think this would improve performance significantly.
For example, let's say we have a group of size 2^16
, i.e. 65536
, and we add 65536
members to it one by one.
The current way addMembers
is implemented, it would take 65536 * 16 = 2^20
hashes to instantiate a Group
.
If the group was able to be instantiated all at once, rather than one member at a time, the total number of hashes that would need to be calculated would equal 2 ^ 16 + 2 ^ 15 + 2 ^ 14 ... + 1 = 2^17 - 1
, which is appx. 2 ^ 3
or 8
times faster.
In Bandada we also store the members in a server DB, but when we run the app we create cached JS Groups based on stored members and we keep them synced as new members are added, so that it can provide an API to generate proofs of membership calculated on server-side. Clients can then use that API rather than download the list of members and calculate the proof of membership themselves.
To extrapolate from the benchmark you ran on your computer, (8000 hashes/1.4s
), this could improve server-side calculation of hashes from (2^20) / 8000 * 1.4 = 184 seconds
to (2^17 - 1) / 8000 * 1.4 = 23 seconds
.
Furthermore, it may be a good idea to be able to save Group
s to disk, and be able to instantiate them from a serialized format without having to recalculate any hashes at all, reducing startup time to something completely negligible.
If the group was able to be instantiated all at once, rather than one member at a time, the total number of hashes that would need to be calculated would equal 2 ^ 16 + 2 ^ 15 + 2 ^ 14 ... + 1 = 2^17 - 1, which is appx. 2 ^ 3 or 8 times faster.
I'm not sure if I got it and you thought about the same solution, but we can actually just skipping hashing a lot of pairs with zero values, as we already have all the members. In that case the number of hashes required to insert 2^16 leaves would be equal to ~2^16 itself I think
I just tested it and created a PR in the @zk-kit/incremental-merkle-tree
library: https://github.com/privacy-scaling-explorations/zk-kit/pull/59/files#diff-c610f7859836b236def3aa08507f455f35fb7f5bfa1604039752c64db3d7908c. I'll add more info soon. Could you take a look?
Considering arity = 2
it seems to be 10 times faster.
I'm not sure if I got it and you thought about the same solution, but we can actually just skipping hashing a lot of pairs with zero values, as we already have all the members. In that case the number of hashes required to insert 2^16 leaves would be equal to ~2^16 itself I think
Yes, this is where part of the performance improvement comes from. Another component of the performance improvement comes from not having to hash the children of any node more than precisely once, which is what I was basing my calculations off of.
In practice, most groups will likely have far fewer members than they have capacity for, which means yes, many pairs are just zero values, so don't have to be re-hashed.
I eagerly await this performance improvement being deployed, so we can see some gains in production! Thanks for looking into this on such short notice.
I eagerly await this performance improvement being deployed, so we can see some gains in production! Thanks for looking into this on such short notice.
Thanks for pointing it out! A lot of people will be able to benefit from it, as that library is quite widely used. As soon as the PR is reviewed by the other people we'll release a new zk-kit and semaphore version.
Amazing! Just curious, but what is your release schedule, i.e. when will this make it to npm?
Amazing! Just curious, but what is your release schedule, i.e. when will this make it to npm?
It's already available, let us know if it works properly!
Is your feature request related to a problem? Please describe. Creating large Semaphore groups is slow.
Describe the solution you'd like The implementation of
Group
has the following two functions:addMember
addMembers
addMembers
is inefficient for large amounts of members, because it simply callsaddMember
in a loop. Based on my understanding of the underlying implementation of the Merkle tree, I believe this causes the tree to re-hash its nodes for every individual member added to the group. I think there should be a more efficient way to interact with the merkle tree data structure, where all the nodes are added at once, then hashed once they have been added.Describe alternatives you've considered
Group
instantiation in a web workerAdditional context