starkware-libs / veedo

Other
76 stars 19 forks source link

(documentation) Robustness against hardware acceleration #5

Open carbeer opened 3 years ago

carbeer commented 3 years ago

I was wondering whether there is already some documentation on how the robustness against hardware accelerated computations of the VDF is achieved? I stumbled across it in the Medium article and I'd be curious to learn more about it.
Which dimension of speedup do you consider possible for ASICs? I am wondering, because I saw that modular multiplication can be accelerated quite a bit by using ASICs (cf. Simon Peffer's talk), so I was wondering how this relates to the VeeDo protocol?

Yael-Starkware commented 3 years ago

Hello @carbeer and thank you for your question.

The delay function f() used by our VDF involves an inherently sequential computation that requires many 64-bit (or 128-bit) arithmetic operations. 64-bit multipliers are built into the hardware on most CPUs, and these CPUs run at great speed. GPUs, on the other hand, are limited to 32-bit multipliers and run at far lower speed. To best of our estimates, a GPU will be roughly 100x slower than a CPU at computing the delay function, and the parallelism offered by a GPU will not speed this up. FPGAs are even worse in this respect, because compared to GPUs they run at lower speeds, and are less abundant and costlier to deploy. Using the speed discrepancy between CPU and GPU/FPGA, VeeDo may incorporate an “expiration time” mechanism that renders all GPUs/FPGAs ineffective for the task of computing even a single instance of f(), while allowing even a 4-year old standard CPU to easily compute the delay function.

Regarding ASIC, since 64-bit multipliers are abundantly available on CPUs, the initial bar for an ASIC producer is quite high and costly, as it requires a high-frequency (and costly) ASIC to compete with CPU speed. An ASIC producer might consider a different attack vector, one that does not increase speed but uses a single chip to pack many parallel pipelines, each pipeline computing a separate instance of f(). Such an ASIC would generate many instance pairs (x,y) simultaneously. However, a simple mitigation to this attack vector is to tweak the parameters of the delay function f(). Such changes have minimal effect on the delay time experienced by average Users computing f() on a CPU, because CPUs are efficient general purpose processors. But an optimized ASIC will become obsolete (if it cannot adjust to the new parameters).

carbeer commented 3 years ago

Hello @Yael-Starkware, Thank you very much for your detailed and insightful answer! Your explanation on how the 64-bit word size rules out GPUs and FPGAs makes total sense, and I see that the techniques from the talk that I mentioned cannot be applied.

Regarding ASICs: How would such instance pairs (x,y) look like or, to be more precise, how could they aid in computing the delay function more quickly?
If I understand correctly, the main idea to defeat these attacks is to adjust the parameters of the delay functions. Is this adjustment something that will happen as a change of specs whenever it is deemed necessary or do you have some mechanism in mind that could incorporate these adjustments into the protocol itself?

Yael-Starkware commented 3 years ago

Hi @carbeer, The instance pairs that I described are the input to the VDF (x) and output of the VDF (y). Actually the ASIC attack that I described - parallel computation of separate f() instances, is not relevant when using the VDF for randomness or time-lock, since for these use cases, the delay of each execution separately is crucial and not the delay of several different instances. For other use cases, such as PoW, it is also important to take into account this attack. To answer your question, a mechanism for adjusting the parameters can be a part of the protocol, but we haven't defined it yet.