DAG Sync: Use a downloaded nullifier set to allow instant spentness checks.

nuttycom commented 1 year ago

One of the requirements for DAG sync is that we be able to more or less immediately identify when one of our existing notes has been spent. In order to do this, we will need to be able to determine whether the nullifier for such a note appears in the history of revealed nullifiers; otherwise we might attempt a double-spend.

In the worst case (considering a wallet that has been active since Sapling activation) this could entail downloading ~4gb of nullifier data to establish the nullifier-to-block mapping that is required to traverse the DAG. We should decide on a strategy for how to address this.

str4d commented 1 year ago

The main need for a non-linear-scan process like DAGSync is a map from Nullifier to "transaction locator". We can locate a transaction with either a txid, or (block reference, tx index within block). However, the latter is more efficient to send to users, because we can just use the getblocks gRPC stream with eveything except block identifiers and nullifiers stripped out (so "tx index within block" is encoded implicitly in the nullifier ordering within each block). This costs (probably around) 40 bytes per block, plus 32 bytes per nullifier.

If we include txids then we add 32 bytes per transaction, which is a non-negligible increase in required data to be downloaded.
If we try to sort nullifiers in an ordering other than block order, the ~40 bytes per block cost becomes ~40 bytes per nullifier, which is a significant increase.

str4d commented 1 year ago

I noted today that we could take a similar approach to commitments, and divide them into ranges (most likely by "number of spent notes", to keep the amount of data per range roughly the same, but we could also split by "number of blocks"). Then lightwalletd could provide a Bloom filter for each range (or some other equivalent filter with no false negatives). Light clients would fetch all the filters, and then use them locally to decide which ranges of nullifiers need to be downloaded (and correspondingly, which ranges of nullifiers they are willing to reveal they may have spent notes in).

The downside is that the filters are unverified, so lightwalletd could serve disjoint nullifier filters to different light clients. But they already learn precise sent information from transactions being sent, so it's not too far from the existing threat/ trust model.

When a light client is near to the chain tip, they will still probably just download all nullifiers (and it is more efficient to do so while they are close enough to the tip that the worst-case nullifier revealing is smaller than the range covered by a single filters). But for historic transactions this would be a significant bandwidth improvement, assuming we can tune it sufficiently to preserve privacy.

str4d commented 1 year ago

Reposting my comment from an internal thread, in response to the question "is the bloom filter approach a replacement to the nullifier approach or are they complementary?":

The Bloom filter approach I suggested could be a replacement for downloading a nullifer -> tx locator map, but only for "continuously synced" wallets.

If a single wallet app is used for a mnemonic from its original birthday, it would be sufficient to just check the Bloom filters to confirm spentness, without needing the full nullifier map to confirm which transaction spent the nullifier.

But as soon as your wallet has any discontinuities (either from the mnemonic being imported into several wallet apps, or because you are recovering from the mnemonic and the first transaction you find does not have any outputs that exist in your currently-unspent note set), then the Bloom filter is insufficient and you need the full nullifer -> tx locator map.

str4d commented 1 year ago

I was using "Bloom filter" above as a placeholder for "approximate set". There are more efficient constructions possible than Bloom filters, in particular constructions that reduce the size of the resulting filter. I remembered Mike Hamburg's RWC 2022 presentation in which they presented work on "Frayed Ribbon filters", which show a 30% reduction in size over Bloom filters for approximate set usage. The work was aimed at improving the performance of Certificate Revocation Lists, which have a similar problem to our nullifiers: membership testing of an append-only set.

There is a research-grade Rust implementation that provides an ApproxSet type. We wouldn't want to use that library as-is (I note it disclaims that its encoding format is not stable before version 0.2.0, and it is currently version 0.1.0). But given that we just need a non-membership check here, we can start out using Bloom filters (since they are widely understood and deployed), and then later upgrade to Frayed Ribbon filters for the bandwidth saving if we want (and want to help productionize it).

Interestingly, the library also provides a CompressedMap type that lets you actually implement a K -> V map that doesn't include the keys, with the provisos that it works best for small values, and a non-existing key returns an arbitrary value from the map. We could potentially use this to directly encode the nullifier map:

lightwalletd caches and provides CompressedMap<Nullifier, (BlockHeight, TxIndex)>
Wallets request the maps, and use them locally to find matching values.
Wallets request CompactBlocks covering the matching values, and then locally filter out the CompactTxs that don't have matching nullifiers.

The downside is that because the map is approximate, wallets need to fetch nullifiers alongside enc_ciphertext in order to do final filtering. So it's not immediately clear whether we would save much over using ApproxSet and then downloading the non-approximate map.

zcash / librustzcash

DAG Sync: Use a downloaded nullifier set to allow instant spentness checks. #776