mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 989 forks source link

Dandelion++ Thoughts on Timers/Scheduling #2880

Open antiochp opened 5 years ago

antiochp commented 5 years ago

@lehnberg's recent work on documenting our Dandelion++ impl #2723 got me thinking about some ways we could potentially simplify the timers and scheduling inherent in the "Dandelion Monitor" in Grin/MW and potentially make it more Grin-native.

Currently we kick off a background thread for the Dandelion Monitor. This runs every 10s and handles the various scheduled tasks.

We conveniently have block times of approx 60s and we could leverage this nice property of Grin to simplify how we schedule the various Dandelion related tasks.

We can simply trigger a single run of the "Dandelion Monitor" via the adapter for every block_accepted event. We can eliminate the need for a long running background thread.

We get randomization for free because it is now based on which stem relay sees the latest block first (and is not dependent on when they see the tx itself).

I suspect (but this needs to be considered more) that this may actually improve the anonymity properties of Dandelion++. It removes any node specific timing information. All nodes are effectively running on the same coarse timer, based on blocks being approx 60s apart. We also get a decent amount of randomization due to the variation in block times.

This may also actually make Dandelion easier to reason about. There are a lot of moving parts and a large chunk of "black-box" behavior in term of timing around aggregation etc. If this was all based on underlying block times there would be fewer overlapping timers (and associated edge-cases) involved.

This has not been fully thought through but wanted to put some initial thoughts down and maybe get some feedback on this.

lehnberg commented 5 years ago

Glad the writing triggered this thinking! Initially, I like it as well.

instead of tracking timestamps for txs in the stempool, track the current block height when we see the stem tx

This would also mean that the timestamp granularity becomes ~1 min, correct? Does that become a problem?

We can simply trigger a single run of the "Dandelion Monitor" via the adapter for every block_accepted event. We can eliminate the need for a long running background thread.

So if I understand correctly, block_accepted will fire roughly once a minute?

We get randomization for free because it is now based on which stem relay sees the latest block first (and is not dependent on when they see the tx itself).

Indeed. Does it become predictable though? Once a new block is propagated, does it becomes fair to expect that most connected nodes in the network will fire the monitor within a certain interval?

Would it make sense for nodes to locally "blind" their actions with delays? (I.e. fire once a new block is propagated and after a random delay of 0-x seconds)

chisa0a commented 5 years ago

Currently we kick off a background thread for the Dandelion Monitor. This runs every 10s and handles the various scheduled tasks.

We can simply trigger a single run of the "Dandelion Monitor" via the adapter for every block_accepted event.

Moving to basing the timer on block intervals, how would transactions not already included in blocks be handled (somewhat echoing @lehnberg's first question)?

Otherwise, first impressions are there are some good improvements there. Will do more (re)reading on Dandelion++, and review the current code for some deeper feedback. Thanks for putting down your ideas!

antiochp commented 5 years ago

This would also mean that the timestamp granularity becomes ~1 min, correct? Does that become a problem?

It's potentially more restrictive but I'm not sure its a problem. The current 30s is relatively arbitrary. If we only get new blocks every 60s then anything faster than that means we aggregate sooner, but the txs themselves will not necessarily confirm any faster. The potential downside here is we would have a lower bound of approx 60s that may introduce a delay between the aggregation, the fluff and broadcast and the tx getting picked up in the next block. We'd need to think through the timing here more.

So if I understand correctly, block_accepted will fire roughly once a minute?

Yes. This is simply a callback/hook in our code that fires every time our local node sees a new block.

Would it make sense for nodes to locally "blind" their actions with delays? (I.e. fire once a new block is propagated and after a random delay of 0-x seconds)

I think we lose a lot of the simplicity here if we then re-introduce a timer/delay. I'm not sure the benefits would outweigh the complexity.

antiochp commented 5 years ago

Moving to basing the timer on block intervals, how would transactions not already included in blocks be handled (somewhat echoing @lehnberg's first question)?

The "stem phase" is distinct from the regular tx broadcast/propagation phase. Miners will only select txs from the regular txpool to include in blocks. Txs only enter the regular txpool once they fluff, either via a node in fluff epoch or explicitly from a wallet sending a tx with fluff=true.

These proposed changes would only affect the stempool and not the regular txpool.

Aggregation every 60s vs every 30s (configurable) as we have today would potentially delay txs from entering the stempool which would then delay them being selected for inclusion in the next block. But its not entirely obvious how much delay would actually be introduced as it depends on several moving parts, including how often miners refresh/reselect txs for the next candidate block.

Edit: for example - it doesn't really matter how quickly you get a tx propagated and into the txpool of every node if miners only select txs once per block period. Miners see a new block, selects txs from the txpool and build a candidate block, then proceed to attempt to mine it for a period of time. If your tx is broadcast after that (even a few ms after that) then it will not get picked up until the following block anyway.

lehnberg commented 5 years ago

I think we lose a lot of the simplicity here if we then re-introduce a timer/delay. I'm not sure the benefits would outweigh the complexity.

Yeah, I think you might be right. What we're talking about here is firing a node's dandelion_monitor every time it sees a new block. It's suggested that we get some degree of randomisation as a result here, as nodes on the network will see new blocks at different times. What would the interval of this be? I.e. what could be expected to be the usual interval between the first node seeing a block and the last node?

I'm trying to think of what the implications would be if (A) this interval was very short (i.e. all nodes firing dandelion_monitor roughly at the same time) and what it would be if (B) it was a large interval (i.e. nodes firing dandelion_monitor at fairly different moments of time).

As per the original D++ PR:

Dandelion monitor runs every 10s, if current epoch is "fluff" -

  • If epoch expired, aggregate and broadcast all stem txs
  • If any stem txs older than 30s (aggregation_secs), aggregate and fluff all stem txs
  • Otherwise wait, allowing stem txs opportunity to aggregate

If (A), it would mean that fluffing would occur almost all at once across the network. If (B), it would be more sporadically. What impact would this have? Is it likely for there ever to be collisions, where multiple nodes fluff the same tx (aggregated with others or not) in the same block interval? Are there any other problems this could lead to?

I believe the proposed approach is favourable from the point of the fail safe mechanism (embargo timer), i.e. keeping track of a transaction and making sure it gets broadcasted in case a malicious node censors it. In the paper, timers are kept by each of the nodes, and padded with some randomness (to avoid the timer that fires first always to be that of the originating node). With the proposed approach, the node that first realizes that a transaction is missing is the node that fires dandelion_monitor first, which is the node that sees a new block first. Something which is completely orthogonal to where in the network the transaction in question first originated from. Surely this must be desirable, especially because it's so much simpler. Right?

And depending on whether it's (A) or (B) above, other nodes would discover the same thing either fairly all at once, or in a staggered fashion. But I suppose only the few nodes on the network that acted as stem nodes will know that the transaction in question has been censored. So it would be important for this information to reach them and that they can act on this.

What happens when a node realises the embargo timer has expired? Does it fluff only that transaction or does it aggregate and fluff all the transactions it's got in its pool? The quote above suggests the latter. Similarly to what I was asking above, what would happen if there are multiple different versions of a transaction being fluffed on the network from different nodes at once?

sesam commented 5 years ago

embargo timer expires

Should be per tx, right? So then grin fluffs that tx, because the stemming porcess failed to get the tx mined.

what would happen if there are multiple different versions of a transaction being fluffed on the network from different nodes at once?

Grin, seeing a "stemmed tx" being made public that contains kernels ab, cde, g, and that already had locally stemmed a, f, g, would (at least before) immediately fluff each of a, b, f, g separately, and probably also cde. So transactions get de-stemmed if some dandelion node leaks. Or, say, a wallet sends its transactions to more tgan one server.

Correct me if I'm wrong :)

Regarding the simplification proposal in this issue, IMHO it distorts the evenness of E(time from creation to fluffing) to become more bursty instead of evenly spread. Not important while grin has few (0.05tx/s average) and maybe already at tye sources a bursty distribution of tx creation times.

antiochp commented 5 years ago

Related conversation on forum - https://www.grin-forum.org/t/objective-dandelion/4678