nislab / gensync

A new framework for benchmarking and optimizing reconciliation of data.
http://doi.org/10.1109/TNSM.2022.3164369
GNU General Public License v3.0
9 stars 1 forks source link

Range-Based Set Reconciliation (negentropy) #9

Open hoytech opened 11 months ago

hoytech commented 11 months ago

Hi! Have you seen the set reconcilliation algorithm described here? https://arxiv.org/abs/2212.13567

I wrote a C++ library that defines and implements this as a concrete protocol: https://github.com/hoytech/negentropy

Before building negentropy, I looked into the algorithms that gensync implements, and my conclusion was that range-based reconcilliation is superior under most conditions:

But I would love to have some objective benchmarks to verify this (or be proven mistaken!).

Thanks for any help!

trachten commented 10 months ago

Hi … sorry for the slow response; I was away. Is the arxiv version the latest version of your work, or is the a later version published in a conference or journal venue?

Please feel free to reach out directly to @. @.>.

best, -Ari

On Aug 16, 2023, at 1:05 AM, Doug Hoyte @.***> wrote:

Have you seen the set reconcilliation algorithm described here: https://arxiv.org/abs/2212.13567

I wrote a C++ library that defines and implements this as a concrete protocol: https://github.com/hoytech/negentropy

Before building negentropy, I looked into the algorithms that gensync implements, and my conclusion was that range-based reconcilliation is superior under most conditions:

Handles very large cardinalities (we are routinely using it for syncing data-sets of size 10 million, and greater) No tuning required: works predictably well with large or small number of differences Bandwidth overhead is logarithmic in DB size and linear in number of symmetric differences Negligible CPU usage compared with CPISync/minisketch https://github.com/sipa/minisketch No false positives or failed decodes like with bloom-filter based algos But I would love to have some objective benchmarks to verify this (or be proven mistaken!).

Thanks for any help!

— Reply to this email directly, view it on GitHub https://github.com/nislab/gensync/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM5LU6ENP3BBUHENUW2UOLXVRIH5ANCNFSM6AAAAAA3R6ZZMY. You are receiving this because you are subscribed to this thread.

hoytech commented 10 months ago

No worries. The paper that I posted is not my work, it is by @AljoschaMeyer

The negentropy implementation is my C++ implementation of his algorithm. I don't know if it's been published or not.

I can't see those email addresses, they just got turned into * characters, probably by github?

AljoschaMeyer commented 10 months ago

Hi @trachten,

the paper was accepted at SRDS23, so it should be published shortly. Until then, here is the camera-ready submission: main.pdf

AljoschaMeyer commented 10 months ago

I would be happy to answer any questions or be kept in the loop regarding range-based reconciliation in gensync. I had planned to do some benchmarking for various choices of protocol parameters (recursion anchors, branching degree, fingerprint monoid), auxiliary data structures, and comparisons with MSTs, but there's no code yet (and I have prioritized other projects so far). A Master's student is currently doing some work in that area under my (informal) supervision though.

I have not written a single line of C++ in my life however, though branching over from rust and C should not be too much of a problem.

trachten commented 10 months ago

Thanks for the update! My email is also at the bottom of the README.md file on github (https://github.com/nislab/gensync). It would be great to integrate your implementation into gensync so that we could do an apples-to-apples comparison ... if you don't have the time for it, I can see if I can find an MS student who would like to do it as part of an MS project.

best, -Ari

On Aug 28, 2023, at 11:31 AM, Doug Hoyte @.***> wrote:

No worries. The paper that I posted is not my work, it is by @AljoschaMeyer https://github.com/AljoschaMeyer The negentropy implementation is my C++ implementation of his algorithm. I don't know if it's been published or not.

I can't see those email addresses, they just got turned into * characters, probably by github?

— Reply to this email directly, view it on GitHub https://github.com/nislab/gensync/issues/9#issuecomment-1695904871, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM5LU3CZ6MPQKPFE2ZGAM3XXS2TXANCNFSM6AAAAAA3R6ZZMY. You are receiving this because you commented.

— Prof. Ari Trachtenberg Electrical and Computer Engineering Boston University @.***

trachten commented 10 months ago

Thanks @AljoschaMeyer https://github.com/AljoschaMeyer for the paper draft and the comments! I think that GenSync could be an excellent framework for a practical comparison ...

On Aug 29, 2023, at 4:57 AM, Aljoscha Meyer @.***> wrote:

I would be happy to answer any questions or be kept in the loop regarding range-based reconciliation in gensync. I had planned to do some benchmarking for various choices of protocol parameters (recursion anchors, branching degree, fingerprint monoid), auxiliary data structures, and comparisons with MSTs https://inria.hal.science/hal-02303490/file/paper%20%281%29.pdf, but there's no code yet (and I have prioritized other projects so far). A Master's student is currently doing some work in that area under my (informal) supervision though.

I have not written a single line of C++ in my life however, though branching over from rust and C should not be too much of a problem.

— Reply to this email directly, view it on GitHub https://github.com/nislab/gensync/issues/9#issuecomment-1697039419, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM5LU4NGL544XROKABW7DTXXWVIDANCNFSM6AAAAAA3R6ZZMY. You are receiving this because you were mentioned.

— Prof. Ari Trachtenberg Electrical and Computer Engineering Boston University @.***

hoytech commented 10 months ago

Great! I will take a stab at integrating it, when I find some time. Or if you have any interested student in the mean-time I'm happy to help them with negentropy API.