braddmiller commented 4 years ago

Create a methodology and related tickets to provide mempool access to connected clients. The goal is for clients to be able to see incoming transactions that they had no prior knowledge of.

Criteria:

[ ] Bandwidth aware
[ ] Preserves the privacy of the clients requesting information to the same degree as the existing lightwalletd/client contract.

Time Box: 16 hours

gmale commented 4 years ago

We should consider this from the perspective of both transparent and shielded transactions. It might be the case that we can already see transparent mempool information.

LarryRuane commented 4 years ago

See also #136. Lightwalletd currently does not provide any mempool related information. Here's a list of all zcashd mempool-related rpcs; it would be relatively easy for lightwalletd to implement gRPCs for any of this information:

getaddressmempool -- from Insightexplorer (which we have available), given a taddr or list of taddrs, returns information about any transactions in the mempool that reference those addresses.
getrawmempool -- takes no arguments; returns a list of txids in the mempool; there is a verbose version (same txid list, plus a little information on each tx).
gettxout -- given a txid and an output index, returns detailed information about that output.
getmempoolinfo -- not useful to us, a summary of the mempool (size, bytes summaries).

Here's an example of retrieving the entire mempool (mainnet):

$ src/zcash-cli getrawmempool 
[
  "212425aba4c3686b43cc643ca1181d4ddc0aa32abe6bf96a1803748d682a9006",
  "cf2a6fd3a12045c8a1ce5d3e78041a61bc68e3f5c8e50a4a9eef7396e8028456",
  "08f42df9408cce12706c30dd77b310fe2787a700a3b2034d694240715bafe077",
  "5b0fc43fa93ebc3af58e0dc98750f85f76134bfa1325289620506778168341b3"
]
$

We should make that a "streaming" gRPC, since the list could be quite large. @defuse does raise an important DoS risk in https://github.com/zcash/lightwalletd/issues/136. To address that, maybe the wallet should request a bloom filter of the mempool? That's limited and small size, and the wallet would be able to very efficiently determine that a given txid that it cares about is in the mempool (with very high probability). It has perfect privacy properties since the lightwalletd is returning (a representation of) the entire mempool.

LarryRuane commented 4 years ago

We've had conversations in slack, and, although @pacu, @gmale, and @defuse haven't agreed to this yet, for now at least, I'm proposing the following mempool interface to lightwalletd as related to shielded transactions. We will need to discuss further what should be done for transparent transactions (see Kevin's comment above), but it's likely the following proposal can be extended to account for those as well.

The simplest possible way for the lightwalletd to provide mempool information to the wallets is to add a gRPC that simply returns the mempool (as a list of compact transactions). But if the wallets are calling this gRPC every few seconds, and if there are hundreds or even thousands of transactions in the mempool, this is very bad from a bandwidth (and CPU, battery) point of view. The first criteria in Brad's original comment above (bandwidth aware) suggests an incremental design. This means there should be a way for the wallets to be incrementally updated with the new entries in the mempool, and not have to re-fetch transactions redundantly. (We're not concerned about the lightwalletd being incrementally updated from zcashd, because they're on the same system, at least currently, and communication between them uses localhost which is very efficient. Plus, they don't have mobile memory, battery, and CPU constraints.)

The proposal is as follows:

A new in-memory only (non-persistent) container data structure, MempoolCache consisting of compact transactions
- When needed (on-demand, see below) but no more often than every 2 seconds, lightwalletd re-populates this cache (from scratch) using the getrawmempool and getrawtransaction zcashd RPCs.
- When lightwalletd detects a reorg, it clears MempoolCache (forcing it to be re-populated the next time it's needed).
A new gRPC, GetMempoolTx, that takes as its streaming argument a list of transaction IDs. This list tells lightwalletd which transactions the caller (wallet) already has in its mempool cache (from earlier calls to GetMempoolTx) and so are not needed in the reply. This gRPC's handler in lightwalletd will:
- populate or refresh MempoolCache so that it's no more than 2 seconds out of date
- respond with a streaming list of compact transactions consisting of all entries in MempoolCache whose txids are not in the argument list (thereby allowing the wallets to be incrementally updated)
- This response list will also include "empty" compact transactions, each with a hash (txid) from the argument list but which are not in MempoolCache.

That last point needs elaboration. A compact transaction has this format:

message CompactTx {
    uint64 index = 1;   // the index within the full block
    bytes hash = 2;     // the ID (hash) of this transaction, same as in block explorers
    uint32 fee = 3;
    repeated CompactSpend spends = 4;   // inputs
    repeated CompactOutput outputs = 5; // outputs
}

The argument list to GetMempoolTx may contain txids that are no longer in MempoolCache (either because the tx was mined into a block, or it has expired, or for some other reason it was dropped from the mempool). The wallet thinks this tx is still in the mempool (it must have received it via an earlier call to GetMempoolTx and is now saying not to include it in the reply, because that would be redundant). It's helpful to tell the wallet that this tx is no longer in the mempool. The GetMempoolTx reply does this by including a CompactTx with all fields set to zero except the hash (txid). A transaction with no spends and no outputs (both zero-length) is invalid, and so can be recognized by the wallet as being this special case. The wallet should respond by removing the transaction with the specified txid from its local mempool cache (however it implements it).

An example sequence might be:

Transactions A, B, C enter the mempool
GetMempoolTx([]) --> [A, B, C]
Transactions D, E enter the mempool
GetMempoolTx([A, B, C]) --> [D, E]
Transaction F enters the mempool
Transaction B leave the mempool
GetMempoolTx([A, B, C, D, E]) --> [B, F] (B is an "empty" compact transaction)
Transaction G enters the mempool
GetMempoolTx([A, C, D, E, F]) --> [G]

You might wonder why GetMempoolTx can't just return a separate list of txids that are no longer in the mempool. The reason is that gRPC doesn't allow a streaming reply and another reply (streaming or non-streaming). And it really does make sense for the reply to be streaming, because there could be hundreds or even thousands of compact transactions in the response.

An alternative design could be to have a second gRPC, GetMempoolDeleted that takes the same argument list (txids in the wallet's mempool) and just return the txids in its argument list that are not in the mempool, but that would require the wallet to make a separate gRPC call. But this could be done instead of what I'm proposing.

The special-case "empty" compact transactions are fairly small, just 32 bytes for the txid (which is needed in any case), plus 8 bytes for the index, 4 bytes for the fee, and two zero-length arrays (which would have some small overhead). So it's pretty efficient.

LarryRuane commented 4 years ago

A pretty simple bandwidth-efficiency enhancement to what's described in the previous comment is to have txids in the argument list be truncted txids, just the first 4 bytes or 6 bytes or so, instead of sending all 32 bytes. The only downside is the possibility of a collision -- suppose two transactions, A and B, start with the same 4 (or 6) bytes, and the wallet has received A but not B. It calls GetMempoolTx with the truncated txid; lightwalletd thinks it has sent both, so it sends neither.

One approach to this would be to just ignore this problem, since all it means is there's a 1 in 4 billion chance that the wallet won't be informed of a mempool transaction (it will discover the tx when it's mined into a block). Or we could actually solve the problem by having lightwalletd detect the collision and send both transactions -- it should always be okay for GetMempoolTx to respond with unnecessary transactions.

LarryRuane commented 4 years ago

UPDATE: I no longer think this is a good idea, please ignore it. (I'll leave it here at least for now.)

I just thought of another way to address this problem, but, again, I'm doubtful it's even necessary to address at all. But the client (wallet) could, in its GetMempoolTx request, include an integer from 0 to 28, which lwd interprets byte offset into the txid, and returns the 4 bytes beginning at that offset (instead of always returning the first 4 bytes). The wallet could choose this number randomly. That way, if two txids collide for this request, chances are there would be no collision on the next request a few seconds later (the two colliding txids wouldn't collide on a different set of 4 bytes).

LarryRuane commented 4 years ago

A proposed set of lightwalletd tickets, each line is a ticket:

[ ] Darksidewalletd changes
- [ ] Add darkside gRPC for populating fake mempool
- [ ] Add stub zcashd handler for getrawmempool
- [ ] Add stub zcashd handler for getrawtransaction
[ ] Add mempool data structure
[ ] Add GetMempoolTx() gRPC

holmesworcester commented 4 years ago

@LarryRuane — Did you consider an approach where the zecwallet-light-cli or, better, some persistent process in the SDK itself would connect directly to the network as other nodes do, and pick up all new transactions as they are broadcast to the network, as miners do? What would be the downsides of this approach?

We were considering this approach because:

Latency should be lower. Low latency matters especially for our use case because we're sending and receiving messages via memo, but also is also good UX for the more normal wallet use case.
It seems more scalable and won't break or slow down if the lightwalletd is overwhelmed.

gmale commented 4 years ago

This sounds like a good idea. I'm not familiar enough with how nodes work to understand whether this is feasible on mobile. I also curious whether this could be done easily with "Zebra" nodes. (update: checked with Zebra devs and the answer is not yet)

Depending on how the networking is done, this could be a challenge on mobile because you shouldn't keep long-lived connections.

defuse commented 4 years ago

Connecting directly to nodes would add a new attack surface to the wallet so we'd need weigh the risk vs reward carefully. If there's a remotely-exploitable bug in the lightwalletd<->wallet protocol, it's not too bad, because you'd still need to compromise lightwalletd to exploit it. Bugs in the protocol for talking to nodes wouldn't have that same defense-in-depth.

holmesworcester commented 4 years ago

Connecting directly to nodes would add a new attack surface to the wallet so we'd need weigh the risk vs reward carefully. If there's a remotely-exploitable bug in the lightwalletd<->wallet protocol, it's not too bad, because you'd still need to compromise lightwalletd to exploit it. Bugs in the protocol for talking to nodes wouldn't have that same defense-in-depth.

Yes. On balance I agree that the slightly lower latency isn't worth it. Scalability might be, if lightwalletd's get overwhelmed easily by lots of clients connecting to them for the latest transactions, but until that's a problem it's probably better to not add a new attack surface.

holmesworcester commented 4 years ago

Is there agreement on the approach here? Is there a corresponding ticket for adding this functionality to the light-cli?

gmale commented 4 years ago

Summary

We discussed this, in depth, today. To summarize that discussion:

There are several key phases to implement:

the lightwalletd services
the client interactions with (1)
the privacy and security improvements that can be made in (1) and (2).

Work on (1) will begin immediately, (2) needs more research and (3) won't begin anytime soon but needs to be kept in mind while building (1) and (2).

Details

1. Lightwalletd Services

This component is the most straight-forward and can be implemented almost exactly as described in this github issue. This is unblocked, ready for work, and will be picked up by @LarryRuane next week.

2. Light Clients

Consuming the service on light clients has broader implications for the threat model and privacy. To paraphrase @defuse :

Client actions like sending a transaction or receiving a memo use an identifiable amount of bandwidth, so it's possible to reconstruct an approximate transaction graph merely by intercepting LWD's encrypted internet traffic. In order to fix that, the wallet's traffic needs to be the same, regardless of what the user is doing. Polling the mempool provides some convenient cover traffic in which to hide sends/receives.

Even with Tor, the amount of data transmitted is observable. Consequently, the current lightclient + lightwalletd model exposes weaknesses, including the following that are documented as counter-intuitive:

An adversary can
- tell that and when the user received a fully-shielded transaction
- tell that and when the user sends a fully-shielded transaction
- learn who the user is sending/receiving funds to/from in fully-shielded transactions
- tell how many transactions the user has sent or received over time
- determine whether or not the user’s wallet owns a particular address
- tell who the user is
- tell where the user is
- tell that the user spends or receives money according to a certain pattern
  (e.g. recurring payments) using fully-shielded transactions
- tell when the user spends shielded funds sent to them by the adversary

Ideally, a light client implementation should avoid these weaknesses, as much as possible. At a minimum, it should not further weaken the privacy properties of the library. Work in this area is considered blocked until further research and discussion is completed.

Additionally, on the Android/iOS side, we discussed effectively "quarantining" mempool data so that it cannot "contaminate" valid on-chain data and be used as an attack vector for tricking users into thinking that things have happened on chain. @gmale will begin working on a proof of concept in the mobile SDKs for decrypting transactions into a separate database from on-chain information.

3. Privacy Improvements

The mempool work could be leveraged to help address some of the primary weaknesses in the current privacy properties of light clients. For example, if the client and server always communicated through regularly timed, constant-sized messages, then most weaknesses related to bandwidth measurement could be eliminated.

One potential way to do this would be to wrap the lightwalletd service response in a fixed-size container, similar to how packets or frames function in networking. This container would hold a small header, describing the contents (i.e. length, checksum, etc.) and a payload of bytes that are padded, as needed, in order to achieve a fixed size for the outer container:

.==========================.             
|   Fixed-Size Container   |
.==========================.
|  +---------+----------+  |
|  | Length  | Checksum |  |
|  +--------------------+  |
|  |      Payload       |  |
|  +--------------------+  |
|  |      Padding       |  |
|  +---------+----------+  |
+--------------------------+

Both the request and response could utilize this general approach, sending and receiving data at regular intervals, and this would be fairly trivial to implement via gRPC: the container would be defined in the proto file and used as the both the input and output for the mempool service call.

Since this represents a fundamental change in design for the light client libraries, this work is also blocked. More research would be needed on the implementation details to strike a balance between performance and privacy with minimal impacts to existing code.

gmale commented 4 years ago

Pinging @holmesworcester and @adityapk00 and @nighthawk24 for any input here, since things are still in the research phase.

holmesworcester commented 4 years ago

Wow this is great work! Overall I really support how you're taking on data leaks at the network layer in thinking through the next steps in light wallet design. That's really good stuff. My questions would be:

Will you release the functionality in lightwalletd as soon as it's complete? Or will you wait until the client libraries are done?
Are there ways you're sure or suspect that fetching zero confirmation transactions will worsen the privacy properties from the existing version? Or Is it that you're trying to address both issues at the same time?
When does basic default Tor support come to the light wallet? How does that fit in here?
Will your work on all this become part of zecwallet-light-cli? Or will it just be available to mobile apps? (Generally, if we want to be using the official ECC approved light wallet client libraries in an Electron app that currently uses zecwallet-light-cli, what will be the best way for us to do that?)
Are there any other data leaks that are bigger or easier to fix and unrelated? Like the memo fetching leak to the light wallet server, for example?

It seems like the goal here is the right one and the question here is about how to sequence these things. I'd worry about them all getting lumped together and progress on any one being held up by progress on the others. Batching is bad!

The rule of thumb that I follow is that all work should be broken down into the smallest possible solutions that deliver value from a user's perspective. I think it would be helpful to list all of these here: the chunk of value from the perspective of the user, whether it's performance or a certain kind of data not leaking, the minimum solution that achieves it, and the difficulty of achieving it.

Then I think we should talk through personas or threat models for specific users. Or, if we don't feel like we're clear enough on that, talk through the hierarchy of potential attackers. For example, there are many more attackers who can subvert a lightwalletd run by a tiny team, or request connection logs from an ISP, than there are attackers who can link and deanonymize users by monitoring Tor network traffic, no?

On Fri, Aug 21, 2020 at 3:57 PM Kevin Gorham notifications@github.com wrote:

Pinging @holmesworcester https://github.com/holmesworcester and @adityapk00 https://github.com/adityapk00 and @nighthawk24 https://github.com/nighthawk24 for any input here, since things are still in the research phase.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zcash/lightwalletd/issues/169#issuecomment-678466154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUFLWPGU5XWAVKMFI4UDDSB3GTJANCNFSM4KGGJ2DA .

leto commented 4 years ago

@gmale FYI I implemented the suggestion 3 "Fixed sized containers" for the websocket Wormhole protocol of our fork of ZecWallet, which has almost all of the same privacy concerns and threat models as the lite wallet:

https://github.com/ZcashFoundation/zecwallet/issues/212#issuecomment-573438742 https://twitter.com/dukeleto/status/1231583675017519113

This issue has been ignored for about 1.5 years by Zcash Foundation and Company, which greatly reduces the privacy of any users who use the Zecwallet Wormhole service. It's a one line fix, there is absolutely no reason except malice or stupidity, to not fix this bug.

To ignore this fundamental issue for the Lite wallet, and kick it down the street for another few years, is a great disservice to the perceived privacy of ZEC mainnet users. Yes it's more than a one line fix here, but it's the only way to correctly solve bandwidth metadata leaks related to mempool support.

pacu commented 4 years ago

@holmesworcester

Will you release the functionality in lightwalletd as soon as it's complete? Or will you wait until the client libraries are done?

It's usually developed 'out in the open' but that's more of a question for @LarryRuane

Are there ways you're sure or suspect that fetching zero confirmation transactions will worsen the privacy properties from the existing version? Or Is it that you're trying to address both issues at the same time?

if I'm not wrong, as SDK clients are already asking LWD for specific tx IDs to fetch memo's and existing outgoing transactions. So I don't believe that this would make privacy 'worse' comparatively to what out threat model contemplates. the wallet team debated about the inconvenience of mixing 0 confirmation data with confirmed data on a payment app like ours and thought that it would be inadequate so we would take action on our SDK implementation to avoid that. I guess that @defuse can bring on specific security and privacy issues if any.

When does basic default Tor support come to the light wallet? How does that fit in here?

I checked current issues and we don't have that on the roadmap, but @braddmiller as PO could give you a better answer.

Will your work on all this become part of zecwallet-light-cli? Or will it just be available to mobile apps?

this is probably an @adityapk00 question.

(Generally, if we want to be using the official ECC approved light wallet client libraries in an Electron app that currently uses zecwallet-light-cli, what will be the best way for us to do that?)

maybe we (the wallet team) could pair up with you guys and talk about that!

defuse commented 4 years ago

@holmesworcester, these are great points:

Are there any other data leaks that are bigger or easier to fix and unrelated? Like the memo fetching leak to the light wallet server, for example?

Then I think we should talk through personas or threat models for specific users. Or, if we don't feel like we're clear enough on that, talk through the hierarchy of potential attackers. For example, there are many more attackers who can subvert a lightwalletd run by a tiny team, or request connection logs from an ISP, than there are attackers who can link and deanonymize users by monitoring Tor network traffic, no?

What are everyone's thoughts on this? Do we have any perspective from potential/actual users on these questions? To make the contrast clear, here are the two most pressing problems with the current protocol in my opinion:

Problem 1: An attacker who compromises lightwalletd can recover the entire, exact graph of who's-paying-who, by watching which wallets send which transactions and which other wallets express interest in those transactions' memos.
Problem 2: An attacker who's only observing lightwalletd's (encrypted) traffic can learn some information about the transaction graph, but it's not exact. They only learn the timing of when wallets send and receive, which they'll have to match up with what they're seeing on the blockchain to try to make inferences about particular transactions.

In Problem 1, the attacker is stronger (they've had to break in to the lightwalletd server), and the result is a bad breach of privacy for all users. On the other hand, the attacker in problem 2 is weaker (they just have to intercept traffic), and the outcome of the attack is not as well understood (it could be just as bad, or slightly less bad, or maybe not that much of a problem, depending on the number of simultaneous users and user behavior).

Putting yourself in the shoes of a prospective user, how do you all feel about the relative priority of these problems? Would either (or both) be a showstopper to using the wallet in certain situations?

defuse commented 4 years ago

@leto Thanks for thinking about those issues and raising them with patches! It's important to acknowledge and highlight privacy/security shortcomings so they can be prioritized relative to each other and so that users are informed about the software they're using. If there's ever a case where there's a weakness in an ECC mobile wallet that isn't covered in the threat model please let me know! (The items in bold are what I personally feel are the highest priorities to fix).

leto commented 4 years ago

@defuse The Hush team has gone deep into implementing mempool support at the lightwalletd layer and we are interested to see what ECC comes up with. If you want a comment about weaknesses of your threat model, it does not mention the mempool a single time. If mempool support will be added to lightwalletd, it's time for the threat model (which is quite well-written, I may add) to expand to cover a mempool Threat Model.

It would be great to cover the recent security/hardening changes to the Zcash mempool, such as randomly ejecting things from the mempool when it gets full, in this threat model. As far as I know, only some code comments and tests document these security features.

Your threat model would be more useful if it broke things out into lite vs full node mempool attack scenarios. Lite wallets are vastly inferior and potentially vulnerable to entire classes of attacks that full nodes never have to worry about, such as Covert Channels in zk-snarks, good old chain forks and the inherent privacy issues of trusting a 3rd party server to help make shielded tx's.

Fixed size packets are essential to not leak metadata such as if a tx is using zaddrs, how many zaddr recipients are being sent to, which RPC is being used, and many other little metadata tidbits that erode ztx privacy. I highly encourage ECC to enforce all Wormhole and lightwalletd clients do this. If reference servers all require it, then all 3rd party wallet software will follow along.

To summarize, the current state of affairs is that way more metadata is leaked than Problem 2 already in the normal way Zcash Wormhole works, but not quite as bad as Problem 1. I have not done review of the latest JS Zcash wallets because we do not use that code.

defuse commented 4 years ago

One note on the fixed-size container proposal: an attacker monitoring the network traffic will also learn the timing of when traffic gets sent. That can potentially leak information, even if all the requests and responses are the same size. For example, the wallet sends mempool queries at a fixed interval like every 10 seconds, the attacker can infer that any requests happening outside of that predictable polling are something other than mempool access, like sending a transaction, or something else.

leto commented 4 years ago

@defuse agreed, a simple fixed polling interval is not great. This can be fixed by adding random noise to the polling interval and using exponential backoffs, which will be very hard to differentiate from everything else

LarryRuane commented 4 years ago

Seems like the fixed-size container idea could be implemented as a separate, invisible layer on top of gRPC (or within gRPC itself, but that team likely won't want to dedicate resources to implementing it). Invisible meaning, no change to the gRPC interface. Much like standard network packet switching, it could break large requests or replies up or merge small ones together into fixed-sized requests and fixed-size responses.

Of course, the latencies would be greater in both directions than today (nothing worth having is free). Requests and replies could follow the Poisson distribution (whether there's any data to send or not -- fill to the fixed size), which I think @leto meant by exponential backoff. Initial sync (block download) would be much slower; that may need to be excepted from this mechanism somehow.

If this is its own separate layer, nothing in the server or client-side would need to change.

LarryRuane commented 4 years ago

Here's an interesting performance-efficiency idea from Bitcoin Core: https://bitcoincore.org/en/2016/06/07/compact-blocks-faq/

If the wallets have a copy of the mempool, then when a new block propagates, they likely already have all or most of the transactions in that block. So compact blocks the lightwalletd sends to the wallets could contain txids instead of compact transactions (and they even have some way of compressing the txids so the compact blocks are even smaller); the wallet could ask for any transactions it doesn't have.

We'd need a new gRPC for this, because currently there's no way to fetch compact transactions (GetTransaction returns full, zcashd blockchain, transactions), but that's very simple.

leto commented 4 years ago

@LarryRuane to clarify, this is what I mean by "exponential backoff": https://en.wikipedia.org/wiki/Exponential_backoff

For example, instead of polling at a fixed interval, you wait an interval equal to the next Fibonacci number (or some other exponential sequence) until you have waited long enough. With added "noise", there is no way to isolate the mempool requests from any other requests. It's a solved problem, decades ago.

Seems like the fixed-size container idea could be implemented as a separate, invisible layer on top of gRPC is a very complex suggestion and not at all what I am suggesting. Lite wallets cannot request individual txid's, that leaks metadata. The only thing they should ever do is ask lightwalletd for all mempool data, possibly paginated if it's large. Doing anything else tells the server which txid's the client is interested in.

There are some issues if the mempool is under attack and being spammed, with bandwidth usage of lite clients. That is still open to research, in terms of the best ways for lite clients to handle that, without magnifying the attack like a reflected DDoS.

lindanlee commented 4 years ago

Future work on this:

leto commented 4 years ago

@lindanlee those 2 issues do not replace and ignore 90% or more of the privacy bugs described in this issue. Is it the stance of ECC to ignore those metadata leakage bugs?

holmesworcester commented 4 years ago

Putting yourself in the shoes of a prospective user, how do you all feel about the relative priority of these problems? Would either (or both) be a showstopper to using the wallet in certain situations?

@defuse it seems like both problems # 1 and # 2 above result from the fact that the lightwallet client only fetches memos for transactions that it is interested in. My memory of the justification for fetching only memos the user is interested in is that it results in a 70% decrease in bandwidth consumption. Is that correct? If so, that seems like a very small benefit relative to the privacy lost. It seems like we should start there and change the default behavior.

holmesworcester commented 4 years ago

@pacu thanks! I'll ask bradmiller about Tor support.

gmale commented 4 years ago

My memory of the justification for fetching only memos the user is interested in is that it results in a 70% decrease in bandwidth consumption. Is that correct?

It depends on the wallet's use but I don't think 70% is the right number. It would be a function of #_of_your_memos vs. #_of_all_memos which grows as the total universe of Zcash transactions increases. In other words, we are saving 580 bytes for every output that does not belong to our wallet--the more that aren't yours, the more you save. In concrete terms, not a single message that has been sent on Zbay was for my Android wallet. So I'm saving 0.5K per output for all of those transactions, and most transactions have at least 2 outputs. Aditya mentioned testing one account that had 10K transactions. I don't need to download a single one of those memos on my Android wallet.

Lastly, bandwidth isn't the only thing being consumed. Device storage and battery use are also constrained resources to consider and both are drained when processing additional memos. Storage can be mitigated by discarding unnecessary memos but additional battery use is hard to avoid yet, arguably, one of the things the user cares about most.

To be fair, I'm not saying we can't download all memos. I'm just enumerating some of the tradeoffs that factored into the current model.

holmesworcester commented 4 years ago

@gmale

It depends on the wallet's use but I don't think 70% is the right number. It would be a function of #_of_your_memos vs. #_of_all_memos which grows as the total universe of Zcash transactions increases. In other words, we are saving 580 bytes for every output that does not belong to our wallet--the more that aren't yours, the more you save.

In the case where I have never been sent a memo, what's the bandwidth increase? Like, what's the difference between the size of the two feeds of everything, the memoless one and the memoful one? (This shouldn't depend on the user's case.)

I might be missing something but I think the real-world numbers will approach this number as a limit as the ratio of "transactions I care about" to "all transactions" approaches zero as that denominator increases.

Should we assume that the impact on battery life is proportional to the amount of data received over the network? That seems like a reasonable guess to me but I have no idea if it's right.

holmesworcester commented 4 years ago

Also, here's a source for the 70% number. Is this out of date?

The memo field is ~70% of a Zcash block. https://electriccoin.co/blog/zcash-reference-wallet-light-client-protocol/

gmale commented 4 years ago

In the case where I have never been sent a memo, what's the bandwidth increase? Like, what's the difference between the size of the two feeds of everything, the memoless one and the memoful one? (This shouldn't depend on the user's case.)

If we ignore the 70% for a moment and stick with the 580 bytes per output, instead, then that results in the following:

    .====================.        .====================.
    |   Memoless Block   |        |    Memoful Block   |
    .====================.        .====================.
    |        10 T        |        |  10 * (T + 1160B)  |
    +--------------------+        +--------------------+

In other words, if T is the size of a "memoless" transaction in bytes, then a single memoful compactblock with 10 transactions, averaging 2 outputs each, would be 11.6KB larger than a memoless compactblock. That's a single block. Currently, the chain tip is about 540,000 blocks away from sapling activation. Meaning, a complete "memoful" chain of compact blocks of this size would be 6.264GB larger than the "memoless" one.

And that size difference would only increase over time.

I think the real-world numbers will approach this number as a limit as the ratio of "transactions I care about" to "all transactions" approaches zero as that denominator increases.

I agree with what you're saying here--basically that the number above is a decent upper limit for how much data we're considering. No matter how you slice it, it is a substantial amount of data for mobile devices and it is non-trivial for most other use cases, as well. For every 100,000 blocks we process, a single extra byte on a block creates nearly a megabyte of extra download size. Therefore, a 580-byte savings PER OUTPUT is very significant.

Should we assume that the impact on battery life is proportional to the amount of data received over the network?

Yes. It takes a substantial amount of battery to operate a wireless radio. Battery drain is also a function of the amount of processing done (parsing, decryption, storage, etc).

holmesworcester commented 4 years ago

Okay but the relative increase is the important thing, right? If the reduction is 70% and that's 6GB vs 2GB, I would argue that this difference is not worth a huge reduction in privacy when the product is a privacy tool.

Whatever device I'm on, if I'm looking for privacy I'll download the extra data in exchange for better privacy. Tor must be at least 3x slower than not using Tor, or at least it was for years, but people still used it for its privacy properties.

On Tue, Sep 1, 2020 at 6:09 PM Kevin Gorham notifications@github.com wrote:

In the case where I have never been sent a memo, what's the bandwidth increase? Like, what's the difference between the size of the two feeds of everything, the memoless one and the memoful one? (This shouldn't depend on the user's case.)

If we ignore the 70% for a moment and stick with the 580 bytes per output, instead, then that results in the following:
.====================.        .====================.

|   Memoless Block   |        |    Memoful Block   |

.====================.        .====================.

|        10 T        |        |  10 * (T + 1160B)  |

+--------------------+        +--------------------+
In other words, if T is the size of a "memoless" transaction in bytes, then a single memoful compactblock with 10 transactions, averaging 2 outputs each, would be 11.6KB larger than a memoless compactblock. That's a single block. Currently, the chain tip is about 540,000 blocks away from sapling activation. Meaning, a complete "memoful" chain of compact blocks of this size would be 6.264GB larger than the "memoless" one.

And that size difference would only increase over time.

I think the real-world numbers will approach this number as a limit as the ratio of "transactions I care about" to "all transactions" approaches zero as that denominator increases.

I agree with what you're saying here--basically that the number above is a decent upper limit for how much data we're considering. No matter how you slice it, it is a substantial amount of data for mobile devices and it is non-trivial for most other use cases, as well. For every 100,000 blocks we process, a single extra byte on a block creates nearly a megabyte of extra download size. Therefore, a 580-byte savings PER OUTPUT is very significant.

Should we assume that the impact on battery life is proportional to the amount of data received over the network?

Yes. It takes a substantial amount of battery to operate a wireless radio https://developer.android.com/training/efficient-downloads/efficient-network-access. Battery drain is also a function of the amount of processing done (parsing, decryption, storage, etc).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zcash/lightwalletd/issues/169#issuecomment-685159867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUFLQD427JFNSZKX2VBOTSDVWJFANCNFSM4KGGJ2DA .

gmale commented 4 years ago

You might be preaching to the choir...

We're in violent agreement that privacy is paramount :smile: . Your original question was whether a 70% reduction in bandwidth is the justification for not downloading memos and I'm attempting to clarify that it is deeper and more nuanced than that. Especially on mobile.

it's not just a 70% difference, it's ~580-bytes~ 464 bytes per shielded output per transaction (which in many cases is more than 70%) For example: while most txs have 1 or 2 outputs, I just checked 50,000 transactions and 13,000 of them had over 100 outputs. 4658 of them had over 800 outputs!
it's not just bandwidth but also device storage, processing time, battery use and server cost
the average entire "memoless" compact block is smaller than a single memo counting only blocks with shielded outputs, in a sample of 50,000 blocks the average compactblock serialized size was 402 bytes

Unfortunately, the differences are substantial. Fortunately, though, nothing in the lightwalletd protocol prohibits Zbay from downloading all memos for all blocks.

holmesworcester commented 4 years ago

Yeah, I’ve heard from Aditya that the difference is more like 95%.

So in the case where transactions have 1 output the difference is 70%, but multiple outputs makes it more than that?

On Sep 3, 2020, at 8:41 PM, Kevin Gorham notifications@github.com wrote:

You might be preaching to the choir...

We're in violent agreement that privacy is paramount 😄 . Your original question was whether a 70% reduction in bandwidth is the justification for not downloading memos and I'm attempting to clarify that it is deeper and more nuanced than that. Especially on mobile.

it's not just a 70% difference, it's 580-bytes per shielded output per transaction (which in many cases is more than 70%) For example: while most txs have 1 or 2 outputs, I just checked 50,000 transactions and 13,000 of them had over 100 outputs. 4658 of them had over 800 outputs! it's not just bandwidth but also device storage, processing time, battery use and server cost the average entire "memoless" compact block is smaller than a single memo counting only blocks with shielded outputs, in a sample of 50,000 blocks the average compactblock serialized size was 402 bytes Unfortunately, the differences are substantial. Fortunately, though, nothing in the lightwalletd protocol prohibits Zbay from downloading all memos for all blocks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zcash/lightwalletd/issues/169#issuecomment-686834576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUFLTEOYALYJEPQWWRQJLSEAZUXANCNFSM4KGGJ2DA.

gmale commented 4 years ago

Effectively, yes. My numbers are a bit off [corrected to 464, above] but the concept is the same: the savings is per output. Fundamentally, no matter how you slice it, 70% is probably not the right number and it glosses over the other costs associated with the additional bandwidth.

On a related note, I recently came across this explanation that is helpful for calculating the raw numbers which are closer to an 80% difference per output :

For light clients however, there is an additional bandwidth cost: every ciphertext on the block chain must be received from the server (or network node) the light client is connected to. This results in a total of 580 bytes per output that must be streamed to the client. However, we don't need all of that just to detect payments. The first 52 bytes of the ciphertext contain the contents and opening of the note commitment, which is all of the data needed to spend the note and to verify that the note is spendable. If we ignore the memo and the authentication tag, we're left with a 32-byte ephemeral key, the 32-byte note commitment, and only the first 52 bytes of the ciphertext for each output needed to decrypt, verify, and spend a note. This totals to 116 bytes per output, for an 80% reduction in bandwidth use.

holmesworcester commented 4 years ago

I think I have a better way to do this.

The problem with achieving the ideal UX of "user A sends to B, user B gets notified" is that creating a transaction takes so long. It takes 15 seconds using zecwallet-light-cli on a modern fast computer, and sometimes longer. And almost all this time is spent creating the proof.

So what if we first sent a summary of the transaction (including amount and encrypted memo) to the network, encrypted to the recipient, but without the zk-proof?

This way, the summary transaction will hit the mempool very fast, and the recipient can be notified.

We'd need to make sure there was no way for the user to turn off their device or close the app before the full transaction was sent. But as long as that's covered I think it achieves the same thing, much faster.

It'd be almost instantaneous, which is great UX, and there's no way you can achieve this UX if you're waiting ~15 seconds just for the transaction to be created on the sender's device.

@gmale thoughts?

gmale commented 4 years ago

My first impression is I really like the creativity of this idea. From a technical perspective, I'd be concerned about the risk of things like DoS attacks, or "Big Spender" issues, where an adversary is essentially flooding a user with transactions that their machine is forced to process that could also contain spoofed information, attempting to trick the user into thinking something happened when it didn't.

I pinged a few people for opinions and security mentioned that without a proof, funds can be created out of thin air until the final moment where a proof is required. The Core team pointed out that these transactions would also get rejected from the mempool (Sapling proofs are checked at the end of ContextualCheckTransaction).

Intuitively, the approach makes me nervous but it also gets my creative thoughts flowing. There has to be a better solution for payment notification! It almost feels like this is something that can or should be done out-of-band. Unfortunately, adding another layer gets difficult in terms of potentially leaking private information. I recall a conversation where Tromer mentioned an interesting idea of pre-computing proofs--keeping a cache of tiny notes, ready to send to an intermediate address. In general, introducing a 3rd address that both parties "control" allows for creative payment solutions.

This might be a great topic to discuss in this week's Light Client working group newsletter.

holmesworcester commented 4 years ago

How long do precomputed things stay fresh for? I remember str4d saying they’d have to be re-done quite frequently.

Re: DoS and Big Spender, isn’t all of that already an issue? Like you can still send “fake money” to someone, and it’s much easier to DoS the blockchain itself than the mempool, no? https://saplingwoodchipper.github.io/

Precomputing is cool and maybe is a good solution for that speed problem if it’s practical. It does seem like it would be really CPU intensive though. Generating zcash transactions makes my fans spin a lot, on a pretty fast macbook pro.

On a phone it must be really expensive, no?

On Sep 28, 2020, at 4:03 PM, Kevin Gorham notifications@github.com wrote:

My first impression is I really like the creativity of this idea. From a technical perspective, I'd be concerned about the risk of things like DoS attacks, or "Big Spender" issues, where an adversary is essentially flooding a user with transactions that their machine is forced to process that could also contain spoofed information, attempting to trick the user into thinking something happened when it didn't.

I pinged a few people for opinions and security mentioned that without a proof, funds can be created out of thin air until the final moment where a proof is required. The Core team pointed out that these transactions would also get rejected from the mempool https://github.com/zcash/zcash/blob/514d86817990a1bf9578ee5fa2d01c34c6ca6035/src/main.cpp#L1450 (Sapling proofs are checked at the end of ContextualCheckTransaction https://github.com/zcash/zcash/blob/514d86817990a1bf9578ee5fa2d01c34c6ca6035/src/main.cpp#L774).

Intuitively, the approach makes me nervous but it also gets my creative thoughts flowing. There has to be a better solution for payment notification! It almost feels like this is something that can or should be done out-of-band. Unfortunately, adding another layer gets difficult in terms of potentially leaking private information. I recall a conversation where Tromer mentioned an interesting idea of pre-computing proofs--keeping a cache of tiny notes, ready to send to an intermediate address. In general, introducing a 3rd address that both parties "control" allows for creative payment solutions.

This might be a great topic to discuss in this week's Light Client working group newsletter.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zcash/lightwalletd/issues/169#issuecomment-700251988, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUFLRP5APYQ5NYJS5T663SIDT2VANCNFSM4KGGJ2DA.

zcash / lightwalletd

Research Spike: Mempool support in lightwalletd #169

Summary

Details

1. Lightwalletd Services

2. Light Clients

3. Privacy Improvements