Multiset reconciliation with IBLTSync

arorashu commented 4 years ago

What?

Multiset reconciliation with IBLTSync may be buggy. Due to failing test: IBLTSyncTest::IBLTSyncMultisetReconcileTest in PR#61, I created a script to pin down a reproducible test case.

The TryMultisetIBLTSync.cpp, in the spirit of TryMe.cpp asks the user to run client and server in different processes, run the sync, and reports stats.

Initial data for server and client is loaded from files client0.txt and server0.txt.

TryMultisetIBLTSync runs both IBLTSync and CPISync, The resultant client and server in IBLTSync do not have the correct number of expected elements (as printed by the stats). At least the server should have expected elements. The resultant client and server in CPISync are as expected (both have the same set after reconciliation).

How to test?

Pull this branch
Run sudo make install
In one terminal instance, run : ./TryMultisetIBLTSync server
In another terminal instance, run : ./TryMultisetIBLTSync client

Both the terminal instances will print stats for reconciliation.

My ouputs

Server

❯ ./TryMultisetIBLTSync server
Initial Server size: 333
Initial Client size: 399
Expected Resultant size: 483

IBLTSync connecting on port 8001...
IBLT Server stats: 
sync failed.
Resultant server size: 412
Stats for IBLTSync
   * expected number of elements = 1020
   * size of values =  8

Bytes Transmitted: 2264
Bytes Received: 66830
Communication Time(s): 0.132822
Idle Time(s): 0.065369
Computation Time(s): 0.020828

CPISync connecting on port 8003...
CPISync Server stats: 
CPIsync succeeded.
Resultant server size: 483
Stats for Basic CPI Sync
   * base field size = 16777729
   * mbar = 510
   * b = 24
   * pErr = 2^-8
   * Evaluation Points = 1

Bytes Transmitted: 2393
Bytes Received: 4574
Communication Time(s): 0.004419
Idle Time(s): 0.039883
Computation Time(s): 2.24235

Client

❯ ./TryMultisetIBLTSync client
Initial Server size: 333
Initial Client size: 399
Expected Resultant size: 483

IBLTSync listening on port 8001...
IBLT Client stats: 
IBLTsync succeeded.
Resultant client size: 442
Stats for IBLTSync
   * expected number of elements = 1020
   * size of values =  8

Bytes Transmitted: 66830
Bytes Received: 2264
Communication Time(s): 0.197452
Idle Time(s): 2.20743
Computation Time(s): 3.6e-05

CPISync listening on port 8003...
CPISync Client stats: 
CPIsync succeeded.
Resultant client size: 483
Stats for Basic CPI Sync
   * base field size = 16777729
   * mbar = 510
   * b = 24
   * pErr = 2^-8
   * Evaluation Points = 1

Bytes Transmitted: 4574
Bytes Received: 2393
Communication Time(s): 0.002712
Idle Time(s): 2.2418
Computation Time(s): 0.002998

P.S.:

I also tried running the IBLTSetOfSetsSync with this dataset, but it usually failed with Invalid variable reference , presumably because I was not setting parameters properly.
The dataset (client and server data) was captured from data generated in the failing tests syncTest.

arorashu commented 4 years ago

So the IBLT implementation in our code use XOR based sums. I think XOR based sums would not work work multisets, i.e. here in IBLT, same key value pairs. This is also noted in the original IBLT paper in Section 3.

So, to support same keys (i.e. multisets), I am retooling IBLT implementation to work on sums. The main challenge currently is handling the overflows.

novakboskov commented 4 years ago

@arorashu What are the concrete overflow issues there, could you give an example? Note that they use XOR more as an optimization for the case when there are no multiple copies of the same key-value pair. Their initial idea is to do residual arithmetic.

Also, look at IBLTSync.cpp:150, the IBLT that we build there uses key = value = datum.to_ZZ(). Thus, it can never happen that we insert different values for the same keys, it can only occur that we insert exactly the same key-value pair multiple times. That is, only "Extensions to Duplicates" (page 14 of the paper) applies in our case. The overflow that can get us into trouble is the word-value overflow on hashkeySum. However, that would happen only when we have a large amount of key-value duplicates. For now, we should calculate the probability for that (seemingly unlikely event) to happen and live with it.

novakboskov commented 4 years ago

This was resolved in #66.

trachten / cpisync