Closed vkomenda closed 5 years ago
Maybe you could double-check that cargo bench
in ethcore/evm
returns better overall timings in commit 967b29e72f6a88ab309d913b8d73b8730ee78562 compared to the latest commit 0d3693aea3d0621722d01900a6894032167457f1.
What remains to be done in this PR are at least two things:
gasometer.rs
to have gas measured in Integer
instead of U256
so that we save time on type conversion;ethcore/vm/src/ext.rs
.I think these two are related. It could be that doing these separately is not possible. Also the external interface is tied to other components which may need to be updated as well.
The effect of the above should be that U256
is completely gone from ethcore/evm/src/interpreter/mod.rs
.
I'll try to fully resync POA Sokol with new binary and see if any errors come up
POA Core and xDai resynced without errors. In Sokol there was an error on block https://blockscout.com/poa/sokol/blocks/5021292/transactions (there's a single tx in that block).
2019-05-28 13:15:06 UTC Stage 5 block verification failed for #5021292 (0xff1a…258d)
Error: Error(Block(InvalidStateRoot(Mismatch { expected: 0x100be2e8b4d067add33a208b619801c7ba9a6427927612324214a15489b0cc3d, found: 0xf29cd92f062278f2149e8bba4e683b271f4bc73bfba3280ef9a84288a5b60613 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
2019-05-28 13:15:06 UTC
Bad block detected: Error(Block(InvalidStateRoot(Mismatch { expected: 0x100be2e8b4d067add33a208b619801c7ba9a6427927612324214a15489b0cc3d, found: 0xf29cd92f062278f2149e8bba4e683b271f4bc73bfba3280ef9a84288a5b60613 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
RLP: f902eef9023da014542f792942c71956a9ff20500eeefc8b5dcacc541fa111882a09e22bf9e65ca01dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d4934794d764da536506bd1af6e494a8cdd37959a7387acba0100be2e8b4d067add33a208b619801c7ba9a6427927612324214a15489b0cc3da029ee61e69790b8e34a4b40c00033d6fc7130b6349f2da66f1f79c091885e147fa027ad54967532e5e4c70407115a045b79491e1a1ddfe6c796182324cbc271a90fbfffffffffffffffffffffffffffffffe834c9e6c837a120082d4d7845bc0afac96d583010b088650617269747986312e32372e32826c69841259bcbcb841480895df40eaba0592bed5005f1645310036bde02aa07125c69ca83f6da68ff6189a2e04cdbe1fe089b390a99c0ccf35ae42fb027f4ef29749c26c0824d20bb501f8abf8a944843b9aca0082d620944f9895cfe49b8e648f8bae68519f059dd997a28280b844497d7551000000000000000000000000201406e518abe5e351a8717dd0f419616ca02fbc000000000000000000000000000000000000000000000000000000000000006481bda0151207ea78ce245710f085fa42ecfdc8000ae023f2e05c0598e23cc6bebdd20fa02dd54b69140cac8edd589764a1701410a014c30aa589a5cbae046e0eac7c111cc0
Header: Header { parent_hash: 0x14542f792942c71956a9ff20500eeefc8b5dcacc541fa111882a09e22bf9e65c, timestamp: 1539354540, number: 5021292, author: 0xd764da536506bd1af6e494a8cdd37959a7387acb, transactions_root: 0x29ee61e69790b8e34a4b40c00033d6fc7130b6349f2da66f1f79c091885e147f, uncles_hash: 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347, extra_data: [213, 131, 1, 11, 8, 134, 80, 97, 114, 105, 116, 121, 134, 49, 46, 50, 55, 46, 50, 130, 108, 105], state_root: 0x100be2e8b4d067add33a208b619801c7ba9a6427927612324214a15489b0cc3d, receipts_root: 0x27ad54967532e5e4c70407115a045b79491e1a1ddfe6c796182324cbc271a90f, log_bloom: 0xgas_used: 54487, gas_limit: 8000000, difficulty: 340282366920938463463374607431768211454, seal: [[132, 18, 89, 188, 188], [184, 65, 72, 8, 149, 223, 64, 234, 186, 5, 146, 190, 213, 0, 95, 22, 69, 49, 0, 54, 189, 224, 42, 160, 113, 37, 198, 156, 168, 63, 109, 166, 143, 246, 24, 154, 46, 4, 205, 190, 31, 224, 137, 179, 144, 169, 156, 12, 207, 53, 174, 66, 251, 2, 127, 78, 242, 151, 73, 194, 108, 8, 36, 210, 11, 181, 1]], hash: Some(0xff1acc51dab93d33969dc3160c777f6645f997a649b265046830282b4bb0258d) }
Uncles:
Transactions:[Tx 0] UnverifiedTransaction { unsigned: Transaction { nonce: 68, gas_price: 1000000000, gas: 54816, action: Call(0x4f9895cfe49b8e648f8bae68519f059dd997a282), value: 0, data: [73, 125, 117, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 20, 6, 229, 24, 171, 229, 227, 81, 168, 113, 125, 208, 244, 25, 97, 108, 160, 47, 188, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100] }, v: 189, r: 9530427700988829602190235950132050505565850277403879439041744793130193768975, s: 20730937074926652319264621547119453737374562879435505647005372538798273728796, hash: 0x5121958fca2cc47adf8ce92b7d3646a9062c6c4b7903c1a38c77a7d4ed5a6dac }
Not sure if it's a real issue or just a bad block, I'll resync one more time.
So is it this block on which resync failed? https://blockscout.com/poa/sokol/blocks/5021292/transactions
Yes, it output the error message and stopped syncing.
@phahulin This version is only for POSDAO (not for POA Networks) because it is based on aura-pos
branch, so it is not compatible with POA Core
, POA Sokol
, Kovan
, and xDai
. Am I right, @vkomenda ?
@varasev ah yes, you're right! @vkomenda could you please create a branch with changes of this PR applied on top of upstream stable? Otherwise we won't have any running network to test them.
could you please create a branch with changes of this PR applied on top of upstream stable?
I think, testing these changes makes no sense because they slow down the node's work: https://github.com/poanetwork/parity-ethereum/pull/137#pullrequestreview-239926576
We should test them when we see speed improvements in comparison with the stable
branch of upstream.
Ok, please ping me when it's ready
I refactored the gasometer. Benchmarks report performance drop of 2-2.5 times with respect to the current aura-pos
. This is in part because I had to move Integer
to U256
conversion to the step function of the EVM interpreter when it reports remaining gas. To remove that and similar conversions it is required to refactor the VM submodule of ethcore
. Whether that would increase performance back by more than 2.5 times, I'm not sure.
As far as I understand, the main reason for slowness is U256 to Integer
and Integer to U256
conversion operations, right?
What if we just do the same for MUL
opcode as for the MULMOD
in the vk-mulmod
branch? I mean instead
instructions::MUL => {
let a = self.stack.pop_back();
let b = self.stack.pop_back();
self.stack.push(a.overflowing_mul(b).0);
},
try
instructions::MUL => {
let a = self.stack.pop_back();
let b = self.stack.pop_back();
let a0 = Integer::from_digits(&a, Order::LsfLe);
let b0 = Integer::from_digits(&b, Order::LsfLe);
let r = a0 * b0;
self.stack.push(
U256::from_little_endian(r.to_digits::<u8>(Order::LsfLe).as_slice())
);
},
Did you try to do this for the vk-mulmod
branch and launch benchmarks for the MUL
operation for different size of the operands a
and b
? What if it would make sense for the big operands?
Another thought: what if we leave the rug
for MULMOD
and ADDMOD
(because the rug
works faster than U256 for these opcodes), but try to use the num_bigint
crate for other math opcodes as Parity team did for the MULMOD
here: https://github.com/paritytech/parity-ethereum/pull/10642/files#diff-f24406c9371a7a621c14519c77ce4bdd
For all these tries we need to launch benchmarks to see whether it makes sense and for which size of operands.
As far as I understand, the main reason for slowness is U256 to Integer and Integer to U256 conversion operations, right?
Yes. The places where the conversions happen matter. If any appear in the interpreter, it will run slower than if there weren't any conversions there. The place where conversions happen make difference to performance. Having conversions inside simple arithmetic opcode handlers (not complex like MULMOD
) is enough to make these opcodes slower than the old U256
version. Currently though there are no conversions inside opcode handlers but there are some when the interpreter interacts with the VM. This interaction, as it happens, when the interpreter steps through an opcode. So, essentially we still do have conversions at every opcode. When the VM is refactored, we should see less conversions and performance should then increase compared to aura-pos
and the upstream version.
I converted the VM in 967b29e72f6a88ab309d913b8d73b8730ee78562 and performance regressed even more. Now it's 2.1-3.1 times slower than in aura-pos
. Using Integer
everywhere turns out to be a naive solution. The upstream version has a better integrated solution with no dependency on rug
and only MULMOD
and ADDMOD
being up to 2 times slower in uncommon cases. If we don't need further optimisation, I vote to adopt the upstream solution. In the other case, we need u64
-valued counterparts of some methods that are now Integer
-valued. Also we should have an EVM interpreter instance that performs internal gas computations using u64
instead of big integers. Instead of more instances we could try defining and using a type that wraps either an Integer
or u64
and copy if the wrapped value is a u64
.
The upstream version has a better integrated solution with no dependency on rug and only MULMOD and ADDMOD being up to 2 times slower in uncommon cases.
I'd propose to leave our approach for MULMOD/ADDMOD with the rug
, but try to use the upstream's num_bigint
solution with other math opcodes as I suggested in the https://github.com/poanetwork/parity-ethereum/pull/137#issuecomment-497756931. Let's first do it for one opcode (e.g., MUL) and compare its performance with the U256. And then try to do that for the rest opcodes.
Also, I'd propose to benchmark this one https://github.com/poanetwork/parity-ethereum/pull/137#issuecomment-497754187 and see if it is really slower than the standard U256.
I'd propose to leave our approach for MULMOD/ADDMOD with the rug, but try to use the upstream's num_bigint solution with other math opcodes as I suggested in the #137 (comment)
I suggested not using rug
because the dependency on it introduces problems with the Windows build. MULMOD
and ADDMOD
are only 1.2 times slower with num_bigint
compared to rug
in the case of small moduli. With large moduli that ratio increases to 2 but is still better than 10 compared to the old version.
Also, I'd propose to benchmark this one #137 (comment) and see if it is really slower than the standard U256.
Isn't this what I did in earlier commits? I used to convert from U256
inside opcode handlers. That was slower than the upstream version.
Ok, that would be interesting to compare num_bigint
approach for all the math opcodes with an old upstream version.
Also, the solution with improving the ethereum-types
as @afck suggested could be implemented. I mean, we should try and benchmark different scenarios and choose the fastest one.
Isn't this what I did in earlier commits? I used to convert from U256 inside opcode handlers. That was slower than the upstream version.
Yeah, I just didn't see the benchmark results for that comparison (or don't remember where they are in this repo). Maybe I'll try to compare that myself when I have time.
I just didn't see the benchmark results for that comparison (or don't remember where they are in this repo)
I think you did benchmark that commit, 967b29e72f6a88ab309d913b8d73b8730ee78562. I used a function apply_rug_truncated_2
which did the conversion and computation.
FYI, the division algorithm in uint
was improved (https://github.com/paritytech/parity-common/pull/126) and integrated into parity-ethereum 2.6 (now in beta).
@ordian Thank you for the info!
Closing in favour of #179.
I guess it's early to test this from my side, am I right? So, I think, it would be worth to launch the benchmarks after the optimization is complete. Or is there something else I can help with this PR?