wownero / meta

archived repo, migrated to git.wownero.com
https://git.wownero.com/wownero/meta
1 stars 2 forks source link

Proof-of-work - what it be? (formerly: To merge or not to merge) #3

Closed jwinterm closed 4 years ago

jwinterm commented 6 years ago

That is the question.

There's already been some discussion about switching to a unique proof-of-work, either by modifying cryptonight further or by abandoning it altogether, with the intent to provide some security through obscurity from 51% attacks by large pools or nicehash. See open issue here: https://github.com/wownero/wownero/issues/15

However, another option that may provide superior security is moving to merge mining with Monero. This would mean tracking changes from their PoW algorithm, and things may get a little hairy around when they hardfork, but if a significant portion of the Monero network adopted Wownero merge mining, this would provide network security superior to the approach outlined in the previous paragraph.

This is far from a decided issue, and there are certainly other alternatives - just tracking changes to Monero's PoW without merge mining, or just sticking with cnv7, masternodes operating network, whatever, but this thread is for discussion of particularly merge mining as a possibility.

RocksteadyTC commented 6 years ago

Thanks for the heads up @wowario

We currently have a bounty to get https://github.com/turtlecoin/turtle-pool and https://github.com/turtlecoin/node-cryptonote-util working with merge mining that a few people are poking at. I'll reference this post if we have anything worth reporting. We don't share a POW with anybody, but any developments should easily translate.

zawy12 commented 6 years ago

@hyc via @Gingeropolous

Someone using FPGAs could easily reconfigure if you only use one algorithm per block.

Is it feasible to optimize and re-program an FPGA for @tevador's output in only 2 minutes?

tevador commented 6 years ago

@zawy12 RandomJS changes the program every nonce, not every block. Using an FPGA is out of the question.

zawy12 commented 6 years ago

@tevador What is the variation in solvetime per nonce? Will it be possible for an ASIC to skip nonces that return a POW that seems harder to solve than others? What if there's a certain class of problems an ASIC is 10x faster at doing, so it keeps changing the nonce until it gets that class of problem? If that type of thing is possible, then changing only per block might be better.

tevador commented 6 years ago

@zawy12 The final code generator settings are still being optimized. But we are aiming for the worst runtime out of 1000 programs to be at most 5-10 times the average runtime. This should limit nonce-skipping optimizations, because it still costs some time to generate and analyze a program and if you throw away the result, your effective hashrate is decreased.

For example, if generating a program takes 0.5 ms and running it takes 7.5 ms on average and you skip 9 our of 10 nonces, you would need to reduce the average runtime under 3 ms to actually gain an advantage (10*0.5+3 = 8 ms) compared to someone who runs all programs (0.5+7.5 = 8 ms).

Also guessing if a program will be fast or slow to run based on its source code is not an easy task. Perhaps someone could use a neural network for that, but I don't think you could actually gain a significant advantage.

binaryFate commented 6 years ago
tevador commented 6 years ago

Some candidates take a very very long time, like > 300s or even sometimes > 1000s (while most others take < 20s). I suggest to stop them before doing the 1000 repetitions when the average is already known to be so bad, and just extrapolate the overall results (or just return a very-bad-on-purpose value)? Would save a lot of time and thus we could increase budget and run the whole tuning much faster. Can you include this sort of check in your program, so if a certain time X has already passed and only N iterations have been done, we know we can give up on this candidate and do not need to do 1000 iterations?

OK. I will add a time limit for objective function evaluation and then we can run the optimization with higher budget.

What about the difference of measurements on different machines? What are we optimizing for exactly? If there is a "target" time included in the fitness function, we will get a completely different result based on the CPU we are using. The tuning process will simply optimize for the particular machine it is running on. What do you think?

You have to benchmark your machine first and then adjust the runtime target as I wrote above. Run ./Tevador.RandomJS.Test.exe --count 10000 --threads 32 and observe the average runtime value. Mine was 14 ms. Default target is 10 ms. Adjust using command line option --runtimeTarget 0.008 (example for 8 ms target if your machine is 25% faster).

I witnessed quite some variations across "instances" for the same candidate

During my tests it seemed pretty consistent. Hopefully irace can handle it.

It seems you are measuring system time, not user (process) time. For instance after hibernating then waking up computer I get:

I use the process.hrtime function, which should measure CPU time correctly. The elapsed time you are showing is not from my program (perhaps it's the output from irace?).

Some parameters are defined as categorical ("c" in parameter file) while they should be ordinal ("o").

OK. I didn't know about that. I will change it to "o".

"Consider a candidate X who got a certain fitness, say 20, on a certain instance. And another candidate Y who got almost the same as X, specifically 20.00001. Do you think that (i) the tuning process should consider that X is clearly unambiguously better than Y on that instance, or (ii) X and Y performed almost similarly on that instance and we should not draw much conclusion from this run due to the very small difference of absolute values?". If you answer (i), you probably want the F-test. If you answer (ii), you probably want the t-test.

Definitely t-test.

binaryFate commented 6 years ago

You have to benchmark your machine first and then adjust the runtime target as I wrote above. Run ./Tevador.RandomJS.Test.exe --count 10000 --threads 32 and observe the average runtime value. Mine was 14 ms. Default target is 10 ms. Adjust using command line option --runtimeTarget 0.008 (example for 8 ms target if your machine is 25% faster).

Ok, I missed that. Makes sense as an approach. What about the 32 threads though? For instance on my laptop I have 4 cores. Does the result depends on the number of threads? (Ideally I suppose we don't want it to depend on that).

I use the process.hrtime function, which should measure CPU time correctly. The elapsed time you are showing is not from my program (perhaps it's the output from irace?).

Yes the "elapsed time" output is from irace, this run gave a very very bad fitness (2147483647, which is 2^31-1) so I have a doubt if you are using the elapsed time somehow.

I am not familiar with nodejs, but from the doc of process.hrtime it measures real time. Real time is not what you want, as if your process sleeps for any amount of time, or the system is busy doing other things, you will get wrong results.

tevador commented 6 years ago

What about the 32 threads though? For instance on my laptop I have 4 cores. Does the result depends on the number of threads? (Ideally I suppose we don't want it to depend on that).

Adjust the number of threads according to the number of cores of your CPU (I assumed you were using the 32 core machine you told me about). By the way, laptops are not the best machines for this task because the mobile CPUs usually have a high turbo frequency, which means the CPU speed is not constant and it adds variation for the runtimes. I used a machine with disabled turbo.

Yes the "elapsed time" output is from irace, this run gave a very very bad fitness (2147483647, which is 2^31-1) so I have a doubt if you are using the elapsed time somehow.

I am not familiar with nodejs, but from the doc of process.hrtime it measures real time. Real time is not what you want, as if your process sleeps for any amount of time, or the system is busy doing other things, you will get wrong results.

2147483647 is output only if one of the test programs crashes. In this case, the runtime is not used in the calculation at all.

I'm not aware of any better method of measuring runtime in nodejs. I don't think it's a problem unless you hibernate the computer in the middle of the optimization or run some other demanding task at the same time.

binaryFate commented 6 years ago

Adjust the number of threads according to the number of cores of your CPU (I assumed you were using the 32 core machine you told me about). By the way, laptops are not the best machines for this task because the mobile CPUs usually have a high turbo frequency, which means the CPU speed is not constant and it adds variation for the runtimes. I used a machine with disabled turbo.

I was just playing around to find a suitable setup. We're getting there :)

2147483647 is output only if one of the test programs crashes. In this case, the runtime is not used in the calculation at all.

Ok

I'm not aware of any better method of measuring runtime in nodejs. I don't think it's a problem unless you hibernate the computer in the middle of the optimization or run some other demanding task at the same time.

It's really not ideal especially with a high-level interpreted language, but after some googling there does not seem to be alternatives, indeed. If anyone knows how to measure process time (not real time) with nodejs please chime in.

@tevador Let me know when you'll have updated the sources to kick out bad candidates faster as discussed (maybe also update the scenario file to use the t-test so the configuration files reflect our discussions). I think the setup will be ok then and we can launch a full tuning.

tevador commented 6 years ago

@binaryFate I have pushed the changes:

We will optimize for a 3.3 GHz Intel Sandy Bridge CPU. You have to run the benchamark:

  1. Make sure to use the latest ProgramOptions.xml
  2. Run node sandbox.js --complexity
  3. Run ./Tevador.RandomJS.Test.exe --threads N --evalTest where N is the number of threads you will use for optimization
  4. Find "Runtime Avg". On the benchmark machine it's approximately 0.00105, so it almost matches the target of 10 ms. Adjust your target with the --runtimeTarget option.
  5. You can also adjust the --timeout option, for example to 4 times the total runtime of the benchmark (on the benchmark machine it was Completed in 29.8903988 seconds with 6 cores).
tevador commented 6 years ago

I have pushed some changes including about 30% speed improvement for objective evaluation and a new configurable parameter for the generator.

binaryFate commented 6 years ago

@tevador thanks for the updates!

Should I launch a larger optimization on big machine now or are you working on something at the moment?

tevador commented 6 years ago

@binaryFate You can try to run a larger optimization with a budget of ~5000 (shouldn't take more than 1 day with 32 cores). Let's see what the results will be.

The newly added parameter FunctionValueOfOverride is meant to decrease the amount of dead code in the program (functions that are never called), but it slows the program down, so I'm expecting the optimizer will set it to false. If that happens, we can run a second optimization with this parameter fixed at true and compare the results. The objective function doesn't account for dead code, unfortunately.

I'm not making any changes in RandomJS at the moment. I'm trying to fix bugs in the XS engine so we can use it instead of the V8.

hyc commented 6 years ago

I'm trying to fix bugs in the XS engine so we can use it instead of the V8.

This worries me. Another good reason to stick to ECMA5 spec instead of ECMA6.

tevador commented 6 years ago

@hyc With ES5, we would lose:

I listed only features used by RandomJS (in descending order how easy they would be to remove). Most of these features contribute to the complexity of the program.

Even if we sacrifice all these features, there is no guarantee ES5 engines will work without issues.

Current state of the XS engine:

The rounding issue and SyntaxError/ReferenceError differences might also affect ES5 engines.

hyc commented 6 years ago

OK. If you believe the differences can be addressed, then go for it.

PS, did you see this compatibility matrix? http://kangax.github.io/compat-table/es6/

tevador commented 6 years ago

@hyc Yes, I was looking for alternative engines in this table. Looks like XS is the only 'small' ES6 engine.

Btw here is my forked XS engine: https://github.com/tevador/moddable I could use some help fixing the numeric issues.

tevador commented 6 years ago

I have fixed the numeric bugs in XS. Apart from the different treatment of SyntaxError and ReferenceError, the engines match for 99.9% of programs. There is still a few minor issues to be resolved.

I'm quite satisfied with the complexity of the programs generated by RandomJS. The amount of recursion, indirection and aliasing is overwhelming when doing manual analysis of program outputs.

Interestingly, I have also rediscovered a bug in V8, when the engine treats invalid regular expression as a runtime error rather than parsing error.

jwinterm commented 6 years ago

What's the benefit to using XS as a reference rather than google's V8 engine? Is it a licensing issue?

Gingeropolous commented 6 years ago

i asked the same elsewhere, according to @hyc , the idea was to find a smaller and simpler engine for the reference implementation @jwinterm . v8 is big and bloated.

tevador commented 6 years ago

@jwinterm XS ~ 87 000 LoC V8 ~ 1 870 000 LoC

tevador commented 6 years ago

I have finally tested the performance of XS. When running in a separate process while comunicating via stdin/stdout, XS is only about 1% slower (almost within margin of error) than the V8 (represented by a minimal wrapper around eval in NodeJS).

The only big difference is memory consumption. When executing 10000 programs in sequence, the XS consumes up to 470 MiB, while the node process stays below ~80 MiB. This may be just garbage collector settings, though.

Maybe someone could squeeze a few additional percent of performance by linking libv8 directly. I think the main reason why there is very little difference between interpreted and compiled engine is that each program is executed just once.

If nobody has any additional comments about the generator, I will start porting it to C++.

tevador commented 6 years ago

Before porting to C++, I decided to make some modifications to the generator.

It's mostly finished, I'm still running tests. These are the major changes:

Can someone review the new documentation? https://github.com/tevador/RandomJS/blob/dev/doc/generator.md

tevador commented 6 years ago

One more note: the change to prevent Eval optimization causes the output of XS and V8 to differ in ~35% of cases due to differences when SyntaxError and ReferenceError are emitted.

This means that web mining in browser will be still possible, but it will produce significant amount of invalid hashes. Using the V8 for serious mining will require some modifications of the engine, which might not be easy task considering the size of the engine's codebase.

Whether it is a good or bad thing is up for discussion.

tevador commented 6 years ago

@moneromooo-monero @jwinterm @hyc @Gingeropolous @SChernykh Can you please review and comment on the latest changes (my two comments above)?

SChernykh commented 6 years ago

@tevador How big are the differences between SyntaxError and ReferenceError in XS and V8? Can they be fixed by a regexp replacement or some more sophisticated but fast code?

tevador commented 6 years ago

I think all the differences can be attributed to non-adherence to the specs by one or the other engine.

Most of the differences are these two categories:

1)

eval('/|+/+a/=ee')
XS: SyntaxError: invalid regular expression
V8: ReferenceError: Invalid left-hand side in assignment

Here the regular expression is invalid (the | operator needs a left side). XS correctly throws a SyntaxError, but V8 skips the RegExp validation and throws on the assignment instead. This is a violation of the specs by the V8. I found this bug on the chromium bug tracker and they are keeping this "feature" for legacy reasons.

2)

eval('++1a=+2e|1')
XS: ReferenceError: no reference
V8: SyntaxError: Invalid or unexpected token

Here the XS thinks we are incrementing the literal 1, so it throws a ReferenceError. The V8 considers the token 1a as invalid literal and throws a SyntaxError. I think the XS is wrong in this case as the specs explicitly forbid that an identifier follows a numeric literal without a space: http://www.ecma-international.org/ecma-262/6.0/#sec-literals-numeric-literals

Not sure how easy it would be to fix these bugs.

SChernykh commented 6 years ago

I meant that maybe a simple set of replacement rules can reduce error rate from 35% to almost 0%.

tevador commented 6 years ago

How would such rules work exactly? If you mean replacing incorrect error by the correct one, then you need to parse the expression correctly, which means fixing the parser bugs. I don't think there is a shortcut available.

SChernykh commented 6 years ago

This means that web mining in browser will be still possible, but it will produce significant amount of invalid hashes. Using the V8 for serious mining will require some modifications of the engine, which might not be easy task considering the size of the engine's codebase.

I don't think that fixing V8 would be that hard, especially if they keep some bugs "for compatibility". Do all straight-forward fixes, add some replacement rules for remaining differences from XS, and error rate will be much less than 35%.

tevador commented 6 years ago

Apart from non-adherence to specs, there is some gray area, apparently.

For example:

eval('++-b');
XS: SyntaxError: missing expression
V8: ReferenceError: Invalid left-hand side expression in prefix operation
SpiderMonkey (Firefox): SyntaxError: expected expression, got '-'

Here the V8 seems to group the - and the b as one expression, while the other engines find two prefix operators invalid in sequence without parentheses.

add some replacement rules for remaining differences from XS

This may be easier said than done. I think it would require extensive changes in the parser to detect errors in the same way as the XS. Another option would be to have a separate XS-like parser just for eval to throw early errors and if the expression is validated, it could be handed over to the V8 parser.

In any case, I don't see any issue with making the XS engine part of the PoW specification, but there should be consensus about that.

SChernykh commented 6 years ago

I'm all for taking XS as a reference, V8 is too monstrous for this.

Another option would be to have a separate XS-like parser just for eval to throw early errors and if the expression is validated, it could be handed over to the V8 parser.

Yes, that's the way real miners will choose if V8 is significantly faster.

jwinterm commented 6 years ago

@tevador briefly looks good. I'm traveling this week so I'll try to spend some time at airport tomorrow looking over it more thoroughly and treating latest version in Windows laptop.

tevador commented 6 years ago

@jwinterm OK. The latest code is in the dev branch.

If you want to inspect the generated Javascript, you should use something like http://jsbeautifier.org/ because the generated code has no line breaks or indentation.

JohannesLau commented 6 years ago

I definitely think switching to RandomJS in the next hardfork is a great idea, it is probably the best algo for asic resistance at the moment, and wownero would be a perfect testing platform for it before potential gets implemented in monero

tevador commented 6 years ago

Status update: I have ported the JS generator into C++.

What is left to do is some optimization, implementation of IPC between the generator and the XS engine (already finished on the XS side) and then the cryptography, which should be pretty straightforward (standard Blake2b hash).

tevador commented 6 years ago

I need some feedback: https://github.com/tevador/RandomJS/issues/5

SChernykh commented 6 years ago

@tevador Just out of curiosity - when is it going live on Wownero network? If everything goes smooth?

tevador commented 6 years ago

If Wownero has a testnet, it could be deployed there very soon (I think next week). All that is needed is to write a replacement cn_slow_hash function that will use randomjs instead of cryptonight.

Before we go live on mainnet, someone needs to review the implementation (not just compile and test, but actually review the whole codebase).

Also currently there is a bug in the XS engine which causes about 1 in every 5000 programs to throw an error. I wrote a bug report in the upstream repository, but it hasn't been fixed yet. We will probably have to fix it ourselves.

Anyways, randomjs can handle JS engine crash and will restart it and reject the block, so it's not major issue.

binaryFate commented 6 years ago

I can't run any sucessful run with the latest version (C#), all of them fail with:

  at Tevador.RandomJS.ProgramFactory.GenBlock (Tevador.RandomJS.Statements.Block block, Tevador.RandomJS.Interval statementsInterval, Int32 maxDepth, StatementType list) <0x407c3a90 + 0x0003b> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenFunctionBody (Tevador.RandomJS.Expressions.FunctionExpression parent) <0x407c2190 + 0x0024b> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenFunctionExpression (IScope scope) <0x407bdb00 + 0x00183> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenExpression (IScope scope, Int32 maxDepth, ExpressionType list) <0x407bae20 + 0x002df> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenVariable (IScope scope, Boolean isParameter, Boolean isLoopCounter, Boolean isConstant, Boolean initialize) <0x407b86c0 + 0x001b7> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenProgram (System.Byte[] seed) <0x407b5e70 + 0x0014b> in <filename unknown>:0 
  at Tevador.RandomJS.Test.ParallelRunner._run (CancellationToken token) <0x407abc90 + 0x0010f> in <filename unknown>:0 
  at Tevador.RandomJS.Test.ParallelRunner+<Run>c__AnonStorey0.<>m__0 () <0x407abb40 + 0x00027> in <filename unknown>:0 
  at System.Threading.Tasks.Task.InnerInvoke () <0x7fd17f023790 + 0x0004f> in <filename unknown>:0 
  at System.Threading.Tasks.Task.Execute () <0x7fd17f023050 + 0x00055> in <filename unknown>:0 
  --- End of inner exception stack trace ---
---> (Inner Exception #0) System.NullReferenceException: Object reference not set to an instance of an object
  at Tevador.RandomJS.ProgramFactory.GenBlock (Tevador.RandomJS.Statements.Block block, Tevador.RandomJS.Interval statementsInterval, Int32 maxDepth, StatementType list) <0x407c3a90 + 0x0003b> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenFunctionBody (Tevador.RandomJS.Expressions.FunctionExpression parent) <0x407c2190 + 0x0024b> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenFunctionExpression (IScope scope) <0x407bdb00 + 0x00183> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenExpression (IScope scope, Int32 maxDepth, ExpressionType list) <0x407bae20 + 0x002df> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenVariable (IScope scope, Boolean isParameter, Boolean isLoopCounter, Boolean isConstant, Boolean initialize) <0x407b86c0 + 0x001b7> in <filename unknown>:0 
  at Tevador.RandomJS.ProgramFactory.GenProgram (System.Byte[] seed) <0x407b5e70 + 0x0014b> in <filename unknown>:0 
  at Tevador.RandomJS.Test.ParallelRunner._run (CancellationToken token) <0x407abc90 + 0x0010f> in <filename unknown>:0 
  at Tevador.RandomJS.Test.ParallelRunner+<Run>c__AnonStorey0.<>m__0 () <0x407abb40 + 0x00027> in <filename unknown>:0 
  at System.Threading.Tasks.Task.InnerInvoke () <0x7fd17f023790 + 0x0004f> in <filename unknown>:0 
  at System.Threading.Tasks.Task.Execute () <0x7fd17f023050 + 0x00055> in <filename unknown>:0 <---

Completed: 0.004
2147483647

All results therefore are always 2147483647

@tevador ^ sorry to post debugging stuff here, but I don't find you on IRC and don't know another way to get in touch

tevador commented 6 years ago

@binaryFate That's because the newly introduced options were missing in the list of irace parameters. I have pushed an updated (but untested) version of the irace configuration files if you want to try it.

Keep in mind that the C# generator with NodeJS might give slightly different results than the C++ generator and the XS engine.

The manually optimized options I'm using in the C++ version seem to match the runtime targets (some CPUs are processing even more than 100 programs per second per core).

binaryFate commented 6 years ago

So is the C++ version usable for optimization? I am not sure because of the few bugs/open things you mentioned like IPC and 1-out-of-5000 error thrown

tevador commented 6 years ago

The C++ generator needs to be recompiled every time ProgramOptions are changed, so it's unusuable for optimization. It's intended for production use only.

I think the 1-out-of-5000 error could be prevented by removing the = character from EvalChars. This would prevent assignment in eval, but we would need to find a suitable replacement character to keep the rate of other-than-syntax errors about the same (not sure if this is possible).

If this is done, then you can optimize with the C# generator + the XS engine. The program generation takes only about 6% of the time of one hash, so generator performance doesn't make a big difference anyways.

jwinterm commented 6 years ago

Lengthy discussion between gmaxwell, andytoshi, and @hyc with respect to the merit of this approach: https://pastebin.com/spug4a8x

tevador commented 5 years ago

Since RandomJS has become obsolete, is there interest to implement RandomX for Wownero?

SChernykh commented 5 years ago

wowario (WOW)11/12/2018 Yeah... Long term, we will eventually fork to something like RandomX

Wowario in Discord on November 12th.

Edit: there is also https://funding.wownero.com/proposal/33

jwinterm commented 5 years ago

@tevador I think so, possibly as soon as whenever you feel like it's ready. Personally I am in favor of another route for Monero, but I think a CPU only algorithm is probably the best route for a small coin like Wownero, and if it keeps us in sync with Monero going forward (for possible merge mining), even better imo.

How close go you reckon RandomX is to being deployable at this point?

SChernykh commented 5 years ago

It's not CPU-only, GPU miners will be available too.

tevador commented 5 years ago

@jwinterm It's 99% finalized, just one little tweak is missing, which I will try to implement this weekend. Afterwards, the plan is to:

  1. Publish a complete documentation (hopefully this weekend)
  2. Daemon integration
  3. Pool software
  4. Mining software

For pools, they will have to be slightly modified to also send the dataset seed, which is tied to the blockchain. The CPU miner for x86 is already finished and @SChernykh volunteered to prepare a GPU miner (CUDA). If someone wants to help with the pool implementation, it could speed things up.