ifdefelse / ProgPOW

A Programmatic Proof-of-Work for Ethash. Forked from https://github.com/ethereum-mining/ethminer
GNU General Public License v3.0
257 stars 84 forks source link

Birthday attack on 64-bit seed #54

Open solardiz opened 4 years ago

solardiz commented 4 years ago

64-bit seed did look like providing only a low safety margin to me during my ProgPoW review last year, and I was going to revisit this and share some thoughts with the community, but in the end I ran out of time and I felt like ProgPoW was non-final anyway (to my liking, at least) yet further tweaks were not encouraged. Now reminded and inspired by @kik's #51 and by this community's prompt response to it and willingness to tweak ProgPoW to fix it, I present another related yet very different attack:

While mining ProgPoW with a large cluster, maintain a cluster-wide cache of mappings from 64-bit seed to 256-bit mix digest (immediate result of the memory-hard computation). This cache can be emptied on every new period (10 blocks) and filled during the period, maybe for up to a pre-defined maximum size (as memory permits) such as 2^32 entries (128 GiB).

Once the cache fill is above a threshold, each cluster node can reasonably start to utilize its attached Keccak ASICs to search the nonce space until a previously cached seed is seen. For example, with a cache fill of 2^32 entries, it'd take around 2^32 Keccak computations until a cache hit. With a large enough cache and with enough Keccak ASICs working in parallel, this might be cheaper than doing a mix digest computation for a previously unseen seed (although the node's GPUs would also continue working on new seeds in parallel).

Now, what cache size would make this attack worthwhile? We'd need to match (and then exceed) a GPU's hash rate with our rate of finding previously cached seeds. With a cache of 2^32, and thus needing to do 2^32 Keccak computations, to match a GPU's e.g. 2^24 (16.8M) hashes/s we need to perform 2^56 Keccak computations per second. Can an ASIC with enough Keccak cores (perhaps across many chips) to accomplish this potentially consume less power than a GPU does? Probably not.

Can a cache much larger than 2^32 reasonably be maintained? Probably yes, distributed across the cluster nodes' RAM. Then fewer Keccak cores would be needed, and their energy efficiency vs. GPUs would be better.

Can a cluster node's Keccak ASICs quickly determine if a seed is (likely) cached? Probably yes, with a probabilistic data structure such as a Bloom filter in RAM closely attached to the ASICs. (They would not need to wait for this check result, but would proceed to test more nonces. There would need to also be locally stored queues of seed values to check.) This RAM could be many times smaller than the cache itself (perhaps 10 to 20 bits per seed, not 256), but it would nevertheless be the limiting factor on the total size of the cache.

Can the Bloom filter RAM have enough throughput to accommodate the many candidate seeds coming out of Keccak every second? That's probably the worst bottleneck. I guess it'd be tricky ASIC design with Keccak+SRAM cores (distributed RAM), NOC, and inter-chip mesh to implement that.

Overall, this doesn't look practical yet. But if we want to have a better safety margin and greater confidence with respect to attacks like this, we need to move to larger seeds.

My bigger concern isn't this attack per-se, but rather that this line of thought could become the missing piece of the puzzle in making some other yet-undiscovered attack practical.

olalawal commented 4 years ago

this is all well and good but i guarantee that other algorithms like even dagger hashimoto have not been looked at from a technical standpoint to this rediclous level, next we will be looking at exploits that require a quantum computer? All of these so called exploits are not breaking in any way the just allow asics the ability to generate solutions at a higher than expected rate of efficiency? This branch has been out here for almost a year and now all of a sudden all this armchair analysts come out the wood work with unpractical edge cases. Um ok.

Sent from my iPhone

On Mar 15, 2020, at 3:21 PM, Solar Designer notifications@github.com wrote:

 64-bit seed did look like providing only a low safety margin to me during my ProgPoW review last year, and I was going to revisit this and share some thoughts with the community, but in the end I ran out of time and I felt like ProgPoW was non-final anyway (to my liking, at least) yet further tweaks were not encouraged. Now reminded and inspired by @kik's #51 and by this community's prompt response to it and willingness to tweak ProgPoW to fix it, I present another related yet very different attack:

While mining ProgPoW with a large cluster, maintain a cluster-wide cache of mappings from 64-bit seed to 256-bit mix digest (immediate result of the memory-hard computation). This cache can be emptied on every new period (10 blocks) and filled during the period, maybe for up to a pre-defined maximum size (as memory permits) such as 2^32 entries (128 GiB).

Once the cache fill is above a threshold, each cluster node can reasonably start to utilize its attached Keccak ASICs to search the nonce space until a previously cached seed is seen. For example, with a cache fill of 2^32 entries, it'd take around 2^32 Keccak computations until a cache hit. With a large enough cache and with enough Keccak ASICs working in parallel, this might be cheaper than doing a mix digest computation for a previously unseen seed (although the node's GPUs would also continue working on new seeds in parallel).

Now, what cache size would make this attack worthwhile? We'd need to match (and then exceed) a GPU's hash rate with our rate of finding previously cached seeds. With a cache of 2^32, and thus needing to do 2^32 Keccak computations, to match a GPU's e.g. 2^24 (16.8M) hashes/s we need to perform 2^56 Keccak computations per second. Can an ASIC with enough Keccak cores (perhaps across many chips) to accomplish this potentially consume less power than a GPU does? Probably not.

Can a cache much larger than 2^32 reasonably be maintained? Probably yes, distributed across the cluster nodes' RAM. Then fewer Keccak cores would be needed, and their energy efficiency vs. GPUs would be better.

Can a cluster node's Keccak ASICs quickly determine if a seed is (likely) cached? Probably yes, with a probabilistic data structure such as a Bloom filter in RAM closely attached to the ASICs. (They would not need to wait for this check result, but would proceed to test more nonces. There would need to also be locally stored queues of seed values to check.) This RAM could be many times smaller than the cache itself (perhaps 10 to 20 bits per seed, not 256), but it would nevertheless be the limiting factor on the total size of the cache.

Can the Bloom filter RAM have enough throughput to accommodate the many candidate seeds coming out of Keccak every second? That's probably the worst bottleneck. I guess it'd be tricky ASIC design with Keccak+SRAM cores (distributed RAM), NOC, and inter-chip mesh to implement that.

Overall, this doesn't look practical yet. But if we want to have a better safety margin and greater confidence with respect to attacks like this, we need to move to larger seeds.

My bigger concern isn't this attack per-se, but rather that this line of thought could become the missing piece of the puzzle in making some other yet-undiscovered attack practical.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

solardiz commented 4 years ago

@olalawal It is unhealthy to the project if/when we have to self-censor our sharing of potential issues out of concern that these would be looked at from a political ([inadvertently] helping one of the camps - for or against ProgPoW) rather than a technical perspective ("strengthening" ProgPoW, as @OhGodAGirl correctly put it in a tweet).

Can we please stick to the tech and discuss it openly? With self-censorship, we'd end up doing the tech worse and then playing catch-up to address issues that we could have discussed and dealt with (or made informed decisions not to) in time.

One of the reasons I said nothing about the 64-bit seed providing only a low safety margin last year was out of concern that my saying so would be misinterpreted as arguing against ProgPoW. I now realize I should have said this anyway - it could have saved us from @kik's exploit and needing to address it now.

solardiz commented 4 years ago

One way to look at the attack I described is that it doesn't let attackers eliminate memory-hard computation (unlike #51), but it gives miners the (unintended) flexibility to partially replace the intended parallel memory-hard computation with different yet also parallel memory-hard computation (lookups against the Bloom filters or such). This makes it harder to analyze expected amortized costs of ProgPoW computation on future hardware, and it potentially gives large clusters an advantage (have the choice to do things differently) over smaller individual miners.

That said, with the numbers I came up with so far (in my original comment), this alternative choice looks less optimal than computing things in the straightforward manner. So at this point the attack is academic.

solardiz commented 4 years ago

Here's an extension of the attack:

During initial cache filling, also use the Keccak ASICs to search nonces such that e.g. upper 20 bits of seed would be all 0's.

Then during the attack I initially described we can filter out most non-cached seeds by requiring the same bits to be all 0's, which is an extremely cheap check without any memory hardness. With the example of requiring 20 bits to be all 0's, we'd have reduced the frequency of Bloom filter lookups by a factor of over 1 million.

This arbitrarily reduces the Bloom filter RAM throughput issue, thereby removing what I described as "probably the worst bottleneck" in my first initial description of the attack.

With fewer maybe-non-0 bits left, this also reduces reasonable cache size or/and increases cache hit rate.

I think this extension is enough to make the attack practical, needing to be addressed now.

solardiz commented 4 years ago

[This comment previously included corrections to the initial revision of my comment above, but I've since edited the comment above to reflect my latest understanding. I am not deleting this comment in case someone wants to see the full edits history.]