ifdefelse / ProgPOW

A Programmatic Proof-of-Work for Ethash. Forked from https://github.com/ethereum-mining/ethminer
GNU General Public License v3.0
257 stars 84 forks source link

ASIC Resistance #9

Closed timolson closed 5 years ago

timolson commented 5 years ago

While a custom ASIC to implement this algorithm is still possible, the efficiency gains available are minimal. The majority of a commodity GPU is required to support the above elements. The only optimizations available are: Remove the graphics pipeline (displays, geometry engines, texturing, etc) Remove floating point math

These would result in minimal, roughly 1.1-1.2x, efficiency gains. This is much less than the 2x for Ethash or 50x for Cryptonight.

I must disagree with these estimates on the ASIC efficiency gains and say they seem too low.

For one thing, isn't floating point a huge part of GPUs? If I understand the history correctly, integer math was emulated by floating point units for quite some time!

But let's ignore floats. There's much more to chip design than shrinking die area, which is actually a very modest cost. Far more important is the power per hash. GPUs are designed to optimize framerate and don't care about power, as long as they can dissipate the heat. Even if an ASIC had the exact same logic requirements and die area as a GPU, significant efficiency can be gained during the physical design process by choosing low power over speed. An ASIC might be slower per silicon area, but it will be better hash-per-watt. The extra capital cost for silicon area is small compared to the operational power efficiency advantage.

It's similar to ARM vs Intel in the CPU world. If you want the absolute fastest chips, you buy Intel, but all cell phones run ARMs. GPUs are Intel and ASICs are ARM.

To be fair, I think ProgPoW is a good try at device-binding, but ultimately GPUs are ill-suited for mining, and the more you utilize the GPU, the greater the gap with ASICs. There's really nothing software writers can do about that, except to minimize the usage of GPU cores by saturating the bandwidth to commodity DRAMs.

Fuller writeups here: My first article is wrong in assuming KISS99 is a major factor. The second article emphasizes the power aspect. https://medium.com/@timolsoncrypto/progpow-is-less-asic-resistant-than-ethash-6efd61d17cfa https://medium.com/@timolsoncrypto/progpow-part-2-still-worse-than-ethash-2b31c5a260d2

Thanks for listening. We can just insta-close the issue if you want. Just thought I should record these comments somewhere in the project.

greerso commented 5 years ago

Hey Tim,

I have been following ProgPoW fairly closely and have seen similar comments before. To save some time rehashing the same conversations, below I have posted a conversation between David Vorick and Mr Def of IfDefElse that took place in a public telegram channel on September 20th and a summary of a segment of the Ethereum Foundation Core Dev Meeting 48 From October 12th. https://www.reddit.com/r/ethereum/comments/9jq75n/progpow_algorithm_change_covered_in_todays_eth/:

Telegram Conversation

David Vorick, [Sep 20, 2018 at 10:44:52 AM]: ...a GPU designer has to think about a lot of different use cases. Between AI, graphics, and other general purpose GPU things, you have to make tradeoff choices for all of your instructions between performance and power. You also have to assume that many of your customers will have irregular usage

Someone making an ASIC for ProgPoW doesn't have to consider all of that

you know that there is just one usage pattern, one use case, and you can build it around class B datacenters and cheap electricity

I am guessing there are all sorts of corners in the architecture where you can pull off cheats and optimizations that a GPU designer wouldn't be able to pull off

I have talked to other GPU designers who have looked at ProgPoW, and they agreed ... a 3x speedup seemed very much in reach, if not a full 10x

Two other super important optimization points are yields and memory capability. A progpow asic has much more generous yield tolerance than a gpu, which means you can drop certain design redundancies and repairs, and also means your successful die cost is going to be lower (because you throw away fewer bad chips per good chip).

For memory, access is everything. Memory today is a massive balancing act between throughput, latency, pipelining, sequential, and random access. You have to find something that works well for all use cases in a GPU. But in progpow, you know what the usage is for all of those. Instead of doing a careful balancing act, you know exact what sort of tradeoffs are optimal

Kristy-Leigh Minehan, [Sep 20, 2018 at 1:07:58 PM (9/20/18, 1:10:18 PM)]: Respectfully, I disagree on the ProgPoW stuff. We’ve gone over this before in private with some folks, you and I, and many other silicon experts with years more experience than you and I have pointed out the cost savings that you could gain, and where the increase could be, but it’s no where near 3x, especially with due shrinkage.

Again, if 3-10x speedup is possible in the same silicon, you need to be competing in the inference chip market.

Ifdefelse, [Sep 20, 2018 at 1:19:10 PM]: GPU chip yields are in the 99% range today for products as big as 1070Ti... And even doing dedicated, repeated work isn't easy to beat GPUs... Otherwise Bitmain's SC1/3 wouldn't be losing to Pascal GPUs on Bitmain's own Tensority algo. We're not even going to start with Turing...

That said, savings can be had. It's mostly in area and not in power. Unfortunately, if you're buying second hand, that silicon area savings is negligible in the product price stack

David Vorick, [Sep 20, 2018 at 1:25:44 PM]: 1070ti is a degraded part though, is it not?

also, is the 99% number a published number?

really, the best way to settle this would be to make a progpow ASIC

Ifdefelse, [Sep 20, 2018 at 1:27:53 PM]: Why would silicon vendors publish their secret sauce. :-) And yes, reduced chip by 5% but that goes to your point, GPUs are already very yield tolerant with a tiny reduction. Especially for smaller, IO BW bound products, silicon costs are only a small portion of product costs. If you use $15 G6 memory as an example, 8 chips is WAY more expensive than 16nm dies at complexity of the Polaris/Pascal generation. So, sure, go save a few dollars off of 200 - 300

David Vorick, [Sep 20, 2018 at 1:29:03 PM]: well, if silicon vendors aren't publishing their secret sauch, I'm calling bullshit on your claims of 99%

Ifdefelse, [Sep 20, 2018 at 1:30:02 PM]: If you only knew. Hahahah. For those few dollars try to save on the die you lose the scale of a 10B/yr industry (discrete GPUs) with associated volume discounts, distribution efficiency, and you get to pay for your own engineers and mask set and schedule risk

Ethereum Foundation Meeting:

Alexey Akhunov "I think that we need a bit more exposition about why...we kind of believe that this...from the description of the algorithm that it's supposed to be doing what it's doing, making it harder to implement ASICS and I get the general idea. But I do believe that if people do really know what the reasons (are) that they can actually explain it in some simple way. Maybe not in a very simple way but at the moment I feel like a lot of people including me --either I'm really kind of dumb or-- I don't really know what (it's) doing. I'm just trusting that someone else who is cleverer than me understands this and I don't. I also got (the idea) that some people are talking to each other or (having conversations) that they cannot disclose and it just doesn't really have a good feeling. So maybe somebody can write down...some exposition about why exactly the current technology of ASICs will not be able to do ProgPOW efficiently in a more detailed way so that people can apply critical thinking rather than just trusting that somebody...says that they have experience and (saying) "okay that will be fine."

Mr. Def "Hey Alexey I think that's totally fair. It would be helpful to get specific questions on areas where you want more information and I think we have been very bad about handling the Ethereum Magician's discussion and so we will improve that in the future and be a little bit more responsive. In terms of why this algorithm is ASIC resistant:I think that we should all start from the point that...the algorithm's goal is not exactly to be ASIC resistant...we started (with) the this effort from the perspective that GPUs are ASICS and we're actually designing from a perspective not to be ASIC resistant but actually be friendly or to be very much tied to a single type of ASIC which is a GPU.

And so that's the perspective that we started with and so in optimizing for a specific type of hardware the goal is to maximally utilize all the functions of that hardware---a large register space (that's expensive) and of course not to forget the starting point of why Ethash is strong which is it's still memory bound. So the algorithm starts from a place where it's memory bound and its still going to be predominately memory bound. In addition, it also has to use the additional registry space that GPUs are able to provide and are needed for additional math calculations. And, on top of that, adds the programability aspect or the programmatic aspect (where) the exact series of math operations that you're running is changing in every epoch or, actually, as proposed with the stratum implementations, would change every 25 or 50 blocks or something like that --to change even faster.

Now, when you do something like that the problem with implementing an ASIC for something like that or a different ASIC or a more custom ASIC is you would have to design the ASIC to either be flexible enough to capture all the possible variations or evolutions of the algorithm or you'd actually have an ASIC that pre-designs for every variation or every math ordering in the evolving algorithm. So, if you pre-design for every possible variations well you're ASIC just explodes. You're just burning silicon area that's mostly unused. If you try to design for the programmability and the register file size that you would need then you basically have something that is a very big ASIC that is also applicable to many other general math problems. Which is fine because if you're gonna design a general math processor, I think that's the goal of this project. I think having more general math processors in the world is a good thing and having these more flexible computation units is a good thing at least until we have POS. So leveraging off the existing install-base of more general math units was the goal of the project.

So we're basically trying to force a custom design to be not-that-custom because you have to flexible to varying and changing math and a very rapid pace and you have enough variation that you can't pre-design for all of it and you have to pay additional silicon to be able to even execute the math.

If you have specific implementation questions in terms of why ASICs can't keep up with it or can't design for these math variations, we can certainly do a deep dive on this and I think for our responses it would be best to put it on some public forum like Ethereum Magicians so that once you ask a question everyone can see the response and we can just point people to that forum if other people have similar questions"

Economics and can you build an ASIC for ProgPOW Timestamped Link: http://www.youtube.com/watch?v=z2mefVnZHpw&t=79m43s Hudson Jameson "I know... you all are doing a Medium post that might answer some more of these questions and make sure that people understand why it's certain types of ASIC resistant"

Mr. Def "Right. To be clear on another point, we tried to make the algorithm as optimized as we could for the GPU but it is true that it is not the most optimized piece of hardware simply because things like GPUs have floating point paths that's not really appropriate for cryptography but that's only a small part of the silicon that's unused. There's other parts of the silicon including display outputs and things like that that, of course, are also unused.

In working and having the GPU-makers assess and review this algorithm the conclusion was that it's roughly 20% of the (GPU) area that would be unused (with ProgPOW) and it would not be a 20% power penalty but simply a 20% area penalty. Or, basically, an area savings that you could have if you stripped out all of the unnecessary bits of the GPU.

And then we also asked them to do an economic analysis of what that savings would be in terms of having an ASIC be more economically efficient (by) saving that silicon area. Online, you can look at die-area estimates and how much it would cost and if you look at GPUs that are most popular in the mining world today --i guess that's the 480/580 and the 106--then it's roughly $50-$60 for a piece of silicon and you save roughly 20% of that which is ~$10 and (then) the total manufacturing cost of the board, that's roughly $200, (so) you're really saving an insignificant amount of the total cost of the board.

So, yes, you can have a more custom hardware design for ProgPOW than GPUs and save some silicon-area but economically speaking it's not a significant impact to the economics where it would cause someone to go do a custom design especially given the amount of volume that GPU manufacturers have access to versus someone who would be doing custom design. The economic structure of doing an ASIC just would not be worth it.

There's also been other comments that we've seen where GPUs are moving further away from doing simple math and that might be true but at least in this generation, until we get to PoS, I think (progPOW) is a reasonable interim (solution) until PoS comes in."-

Providing proof and benchmarks Timestamped Link: http://www.youtube.com/watch?v=z2mefVnZHpw&t=83m40s Alexey Akhunov "Now I kind of understand that you're doing two things. You're optimizing for the GPU and you're doing some things that are harder for the ASICs. So what I would like--when you said you talked to the GPU manufacturer and asked them to do this or that-- is this information available...or were these just some chats you had with some people?"

Mr. Def "We reached out to some connections that we had. I don't think this information is public information however they advised that there are some very good reverse engineering analyses--already existing technical analyses--of this generation of silicon. Let me go and try to dig that up and see if I can point those out. I think, in general, I would expect that GPU manufacturers would not be that excited about doing detailed area analyses because they have competitive concerns about doing exact breakdowns which is why we ended up with a hand-wavey rough estimate."

Alexey Akhunov "What I would suggest if it's possible. I've done some GPU programming myself years ago, I know when you run some algorithms you can actually profile it and it shows you how much of the bandwidth you've consumed and how much of the registry you've consumed and how much of these operations and those operations--it would be nice if you could run that (so that we can have) have some data to demonstrate that this algorithm is actually utilizing these resources in a GPU. Like let's say "it's utilizing 90% of bandwidth". Is it possible?

Mr. Def "Yes. It's possible. I think that's a wonderful suggestion. Let me get on that and we'll have someone put that together."

Conversations with GPU manufacturers and confirming Mr. Def's assertions Timestamped Link: http://www.youtube.com/watch?v=z2mefVnZHpw&t=86m38s Lane Rettig "I know that you said that you (Hudson) and Pawel have been in touch with some GPU manufacturers did I understand that correctly?"

Hudson Jameson "Yes. So right now we're keeping these conversations private because we want to respect the privacy of the manufacturers we're talking to but yes."

Lane Rettig "I was just wondering...if this has been part of that converation already but just getting some confirmation on the ideas that mrdef has shared with us here would be helfpul."

Hudson Jameson "Absolutely. That's exactly why we're talking with them so that we can come on one of the next calls and say "we've confirmed what they're saying with the manufacturers".

timolson commented 5 years ago

Thanks for the paste. I know I'm late to the game. Was keeping quiet about ASIC miners until very recently.

If you try to design for the programmability and the register file size that you would need then you basically have something that is a very big ASIC that is also applicable to many other general math problems... So we're basically trying to force a custom design to be not-that-custom because you have to flexible to varying and changing math and a very rapid pace and you have enough variation that you can't pre-design for all of it and you have to pay additional silicon to be able to even execute the math.

I don't agree with all that. We've been able to do clever things with other PoW's like RandomJS that purport to require general processing.

timolson commented 5 years ago

Let me also admit that I'm not a physical design expert, and I could be full of shit about the amount of power savings being bigger than your claim. But I do think we can do better than you are assuming with the RTL. The algorithm is not changing that much.

timolson commented 5 years ago

I apologize for shooting my mouth off prematurely. Someone on the Monero team asked me about ProgPoW in the context of RandomJS and I didn't study this hard enough before giving an opinion. I deleted both articles until I can discuss a few ideas with 7400 who is an elite chip designer.

ghost commented 5 years ago

Hello @timolson thank you for your opinion about ProgPOW here,

I also disagree that ProgPOW will be a complete solution to make ASICs obsolete forever,

Ethash can be modified a bit to make ASICs obsolete for time being but it can't be forever cause there will be FPGAs and then ASICs for specific mining algorithm

May I ask why did you delete your medium post about ProgPOW? I would like to see them but can't access to the link 😂

olalawal commented 5 years ago

I think the @timolson is missing the point here ... Ethereum will be going to pos probably by 2020, ethereum used dagger hashimoto rather than another algo like scrypt or sha256 in order to KEEP asics off the network.

Right now there are lots of asics on the network, If ProgPow is implemented it will ABSOLUTELY brick all existing asics and even if its not 100 percent or even 80 percent asic resistant as you surmise by the time new asics are ‘possibly’ developed (in qoutes because I am still quite skeptical of your research on the matter)

it will be time for pos anyways which will most likely deter LARGE scale development from the bitmains and Innosilicons.

thats why the focus should be on the implementation of progpow as it is already proven to work with ethhash on BCI coin.

All this FUD on the relative asic resistance or not of ProgPow is for another long term discussion not even in the scope of this project imo

trustfarm-dev commented 5 years ago

Finally I've made FNV changes in MIX parts. Here's my issue suggestion on ethereum issues request to change Keccak algorithm to newer one and change MIX algorithm in EThash.

I've analysed FNV , and it is depricated current usages on ethash. So, I changed FNV following original Offset Based FNV1A.

You can refer FNV algorithm Test code and Results

On This time, PoW algorithm change to newer one, it needs crypto analysis and long time verification time. it's not suitable for prevent ASIC , so late!

But, in case of TEthashV1 (Trust Ethash Version1) has most of things are based on ethash, which is verified several years, but MIX FNV part uses deprecated implementation , So, if this is changed then more strong PoW algorithm based on ETHASH.

Clearly, make obsolete the current ASIC miners too.

And, I also researching FPGA Mining, recent days of synthesizer (eg. Xilinx or Altera) it has tremendous optimization options and great synthesis abilities, So, It is not possible to prevent ASIC. And Clearly agreed on @timolson 's Hash per Watts opinion.

So, Above reasons, it may be regular small parts of algorithm changing is more good for resist centralized mining and prevent ASICs.

In case of ASIC, at least 6 months needs, redesign and consumer products delivery.

Another My opinion on Ethash , is remove 64 rounds of Mix to decrease to 32 rounds. decrease DAG Size or limit Max DAG Size, sustain current Memory accessing bw.

So, then cheaper or some years ago GPU also effective hash on PoW, and many guys can participates mining.

Now, Above TethashV1 is appied on

please check and comments for us. Next 6 months after minor algorithm patch for make a time.

timolson commented 5 years ago

@olalawal Hard forking of course breaks ASICs but that is not the point here. You could hard fork to any other PoW, but the question is: does ProgPoW do better than Ethash at making ASIC development economically nonviable? Otherwise you just end up with hard-fork hell like Monero.

@naikmyeong My original posts were misinformed and published without my usual standard of due diligence. I retract them immediately and completely as a matter of honor, with apologies to the ProgPoW team. The conclusions are not necessarily wrong, however...

ProgPoW is clearly expertly written, and it's a really good try at GPU-binding. But just because GPU's are fully utilized doesn't mean there couldn't be a more efficient chip.

I've consulted with my business partner 7400 who's a proper expert, and we do have some interesting ideas for improving on the power claims made by the ProgPoW team in the Readme and quoted chat. However, it's one of those things that we'd have to do quite a bit of implementation to really know the multiples we could get. Maybe ProgPoW is more ASIC-resistant than Ethash, maybe not.

Bottom line: we have some interesting ideas/tricks that may not have been considered, but we can't say for sure if they'd work well without putting a lot of effort into the project. So our position will have to remain undecided on the ASIC-resistance of ProgPoW. The research we'd need to do is not worth our effort unless ProgPoW is going to be part of a coin with big mining revenue.

timolson commented 5 years ago

@olalawal

new asics are ‘possibly’ developed (in qoutes because I am still quite skeptical of your research on the matter)

You may use skeptical "scare quotes" after you do a synthesis of our CryptoNight ASIC design that's 5x better than the Bitmain X3. Use your favorite foundry libraries, and let me know the results. After doing that, you will have a right to slam our credentials if you still don't think we know what we're doing.

ifdefelse commented 5 years ago

We appreciate the healthy skepticism. A lot of groups have made a lot of promises in this area that turned out to be unfounded. For those that haven't seen it please read our post here: https://medium.com/@ifdefelse/understanding-progpow-performance-and-tuning-d72713898db3

To address a point from your initial post I'm not sure why you claim GPU's don't care about power. The entire computer chip industry has been power limited since the Pentium 4 and GeForce FX days. Recent systems like AMD's RX Vega M and NVidia's Max Q are all about maximizing GPU performance within limited power budgets. Going back a bit Nvidia's Maxwell managed to be both faster and lower power than Kepler while using the same fab process.

We do completely agree with this post: https://medium.com/@timolsoncrypto/cryptonight-is-poison-ab598bfe2d2c

ProgPoW is designed to be as straightforward as possible with a near-optimal solution provided from day 1. The basic requirements of the ProgPoW algorithm are:

This directly maps as something that looks a lot like a GPU:

Looking at the profiler data from our post you can see that ProgPoW matches the throughput provided by current GPUs for these key portions. A ProgPoW ASIC would require a similar register file capacity and math/memory throughputs as those in a current GPU. There shouldn't be any fundamental difference in performance or power between an ASIC or a GPU reading data from a register file/memory, or executing in a programmable vector math unit.

You're right that the floating point logic within the GPU is not used, but I don't expect that to be a huge % of the GPU's area. In most modern chips logic that's not actively in use burns minimal power.

You're also right that an ASIC that only focused on ProgPoW could implement a handful of optimizations over a commodity GPU. Fixed function (ASIC) Keccak and Kiss99 implementations would reduce the power of those, but they're <7% of instructions executed. The merge() ops could be implemented as a single CISC instruction instead of the 2 RISC instructions it takes on a GPU, again a marginal power savings.

Using those types of optimizations our expectation is a ProgPoW ASIC could be ~1.2x better perf/watt. If you compare this to Ethash you can get 1.66x better perf/watt today using FPGA offloading. A proper Ethash ASIC should be >2x better perf/watt.

shelby3 commented 5 years ago

@olalawal

Ethereum will be going to pos probably by 2020

The date has been extended many times. And PoS simply isn’t secure nor can it be made secure, so my bet is it will never happen. Or it will happen and Ethereum will die due to cartelization. Anyway, I digress, but you started it…

@ifdefelse

We appreciate the healthy skepticism. A lot of groups have made a lot of promises in this area that turned out to be unfounded. For those that haven't seen it please read our post here: https://medium.com/@ifdefelse/understanding-progpow-performance-and-tuning-d72713898db3

Well it seems in that article you admitted a very serious flaw in ProgPow. I wrote on RandomX’s repository today:

The flaw in ProgPow is actually acknowledged by the author:

If at some point in the future it was desired that ProgPoW should target the Turing generation of GPUs, it would be a simple matter of changing a few of the tuning parameters (such as PROGPOW_REGS, PROGPOW_CNT_CACHE and PROGPOW_CNT_MATH). With appropriate tuning, Turing GPUs would maintain the same performance, while the current generation of GPUs would become compute limited, and slow down.

ProgPow can’t be immutable. Without immutability all you have is a centralized shitcoin. But we already knew that about Ethereum. And also Monero has also lowered it’s stature by forking the proof-of-work. And I am really hoping they will fork it to RandomX.

Additionally I am not sure if the flaw I have alleged to have found in RandomX also applies to ProgPow. I need to study ProgPow more first.

Excess complexity is detrimental and the optimal solution should be known and straightforward. Equihash and RandomJS both suffer from this issue.

And I posit you will soon be able include RandomX in that list of shooting themselves in the foot with excessive complexity on the entropy versus permutations front.

Also I want to ask you, if you had to put a $10 million performance bond on your claim of 1.2X ratio for an ASIC, would you still stick so adamantly to that number? Funny how people become more circumspect then they need to back up their words with their money.