near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 619 forks source link

Possible performance regression in contract execution #4030

Open birchmd opened 3 years ago

birchmd commented 3 years ago

Our nightly load testing on mocknet has been consistently failing for a long time. Many of these have been related to problems compiling contracts, however the last several runs have been because there are multiple blocks which take more than 1s to process (the test fails if this happens because you cannot have a 1s block time if blocks take more than 1s to process).

The blocks which take too long to process only happen during the phase where multiple transaction types (including contract calls) are being sent to the network; the phase where all transactions are simple transfers still works well. So it might be the case that there is a performance regression in the execution of contracts. The other transaction types sent during this phase are staking and account creation, so it could also be that one of those has seen a regression instead (or perhaps in addition to).

References:

Note: The mocknet test does not change the binary the cluster runs, so checking out a specific commit and submitting a Nayduck run will not make the cluster run on that commit. If you are attempting to locate the regression via a git bisect you will need to manually (using functions from the mocknet python file) deploy the binary to test and restart the cluster.

matklad commented 3 years ago

I am somewhat surprised that we know that the overall operation is slow, but can't say which specific part is slow. Submitted #4033 to get the logs with "this operation took n ms" lines at least for contract runtime.

birchmd commented 3 years ago

Having some more detailed timing information could be helpful!

For now I am just trying to find a commit where the test passes so that we have a baseline to compare master with. Then timing data could be compared between the two runs.

bowenwang1996 commented 3 years ago

@matklad it appears that the regression is not on the execution per se, but something else in between:

[2mMar 04 22:44:56.572 DEBUG runtime: Calling the contract at account load_testing_2055    
Mar 04 22:44:56.580  INFO run_vm_profiled:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=4.37ms time.idle=3.46µs
Mar 04 22:44:56.582  INFO run_vm_profiled:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.91ms time.idle=2.92µs
Mar 04 22:44:56.582  INFO run_vm_profiled: near_vm_runner::runner: close time.busy=9.02ms time.idle=2.60µs
Mar 04 22:44:56.583 DEBUG runtime: Done.    
Mar 04 22:45:00.307 DEBUG runtime: Calling the contract at account load_testing_2421    
Mar 04 22:45:00.319  INFO run_vm_profiled:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=4.70ms time.idle=5.53µs
Mar 04 22:45:00.320  INFO run_vm_profiled:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.69ms time.idle=2.57µs
Mar 04 22:45:00.320  INFO run_vm_profiled: near_vm_runner::runner: close time.busy=8.79ms time.idle=4.14µs
Mar 04 22:45:00.321 DEBUG runtime: Done.    

Notice here that between Done and the next execution it took 4s.

olonho commented 3 years ago

Could we profile it beyond the contract runtime?

matklad commented 3 years ago

https://github.com/near/nearcore/pull/4038 adds some high level tracing for Runtime.

bowenwang1996 commented 3 years ago

@olonho @matklad I looked at the logs and the contract execution part doesn't look right to me

Mar 06 02:52:24.370  INFO Runtime::apply:Runtime::process_receipt:Runtime::run_vm_profiled:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=4.59ms time.idle=7.19µs
Mar 06 02:52:24.373  INFO Runtime::apply:Runtime::process_receipt:Runtime::run_vm_profiled:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=11.8ms time.idle=4.18µs
Mar 06 02:52:24.373  INFO Runtime::apply:Runtime::process_receipt:Runtime::run_vm_profiled: near_vm_runner::runner: close time.busy=11.9ms time.idle=2.99µs

This seems to imply that the actual execution takes less than 5ms but then the preparation phase (compilation + loading) takes more than 7ms, which doesn't make sense given that the contract that is called is always the same.

matklad commented 3 years ago

@bowenwang1996 this matches my expectations. We have in-memory cache disabled, so every execution incurs deserialization cost.

You can check this locally by running

$ cargo test --release --package near-vm-runner-standalone --bin near-vm-runner-standalone --all-features -- script::vm_script_smoke_test --exact --nocapture

The result I get is

      21.48µs run_wasmer/call
    22.25ms run_wasmer
  22.26ms run_vm

      4.38µs run_wasmer/call
    4.62ms run_wasmer
  4.63ms run_vm

      4.14µs run_wasmer/call
    4.37ms run_wasmer
  4.38ms run_vm

      13.37µs run_wasmer/call
    4.19ms run_wasmer
  4.20ms run_vm

If I add a couple more tracing spans (I thought I did them already? Must've accidently deleted when rebasing over wasmer 1.0 work? https://github.com/near/nearcore/pull/4060), we see that that's deserialization:

      21.68ms compile_and_serialize_wasmer
      18.02µs run_wasmer/call
    22.34ms run_wasmer
  22.35ms run_vm

      3.92ms deserialize_wasmer
      4.74µs run_wasmer/call
    4.59ms run_wasmer
  4.60ms run_vm

      3.76ms deserialize_wasmer
      3.81µs run_wasmer/call
    4.44ms run_wasmer
  4.44ms run_vm

      3.54ms deserialize_wasmer
      7.28µs run_wasmer/call
    4.21ms run_wasmer
  4.21ms run_vm
bowenwang1996 commented 3 years ago

@matklad I see. I thought we have #3865. Did something change there?

matklad commented 3 years ago

This is mysterious. I managed to write a local pytest (code) which sends the do_work transaction (the same which is used by load testing) to the node, and here are the logs I get:

Mar 10 15:59:49.890 DEBUG runtime: Calling the contract at account test0    
Mar 10 15:59:49.896  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=4.75ms time.idle=600ns
Mar 10 15:59:49.896  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=5.15ms time.idle=370ns
Mar 10 15:59:49.896  INFO run_vm: near_vm_runner::runner: close time.busy=5.16ms time.idle=610ns
Mar 10 15:59:49.896 DEBUG runtime: Done.    
Mar 10 15:59:49.896 DEBUG runtime: Calling the contract at account test0    
Mar 10 15:59:49.901  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=4.77ms time.idle=580ns
Mar 10 15:59:49.901  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=5.16ms time.idle=370ns
Mar 10 15:59:49.901  INFO run_vm: near_vm_runner::runner: close time.busy=5.17ms time.idle=520ns
Mar 10 15:59:49.901 DEBUG runtime: Done.    
Mar 10 15:59:49.901 DEBUG runtime: Calling the contract at account test0    
Mar 10 15:59:49.907  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=5.52ms time.idle=630ns
Mar 10 15:59:49.907  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=5.92ms time.idle=330ns
Mar 10 15:59:49.907  INFO run_vm: near_vm_runner::runner: close time.busy=5.93ms time.idle=510ns
Mar 10 15:59:49.907 DEBUG runtime: Done.    

So, the time indeed goes fully to contract execution. I then re-run the same with no_cache feature enabled, and got the following result:

ar 10 16:09:28.913 DEBUG runtime: Calling the contract at account test0    
Mar 10 16:09:28.921  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=5.49ms time.idle=830ns
Mar 10 16:09:28.922  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.58ms time.idle=490ns
Mar 10 16:09:28.922  INFO run_vm: near_vm_runner::runner: close time.busy=8.59ms time.idle=590ns
Mar 10 16:09:28.922 DEBUG runtime: Done.    
Mar 10 16:09:28.922 DEBUG runtime: Calling the contract at account test0    
Mar 10 16:09:28.930  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=5.50ms time.idle=720ns
Mar 10 16:09:28.930  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.36ms time.idle=400ns
Mar 10 16:09:28.930  INFO run_vm: near_vm_runner::runner: close time.busy=8.37ms time.idle=430ns
Mar 10 16:09:28.930 DEBUG runtime: Done.    
Mar 10 16:09:28.930 DEBUG runtime: Calling the contract at account test0    
Mar 10 16:09:28.938  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=5.49ms time.idle=790ns
Mar 10 16:09:28.938  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.20ms time.idle=440ns
Mar 10 16:09:28.938  INFO run_vm: near_vm_runner::runner: close time.busy=8.21ms time.idle=460ns
Mar 10 16:09:28.938 DEBUG runtime: Done.    

Even without caches, preparation phase is faster than execution for me locally.

How do I find out which exact node configuration is run on the mocknet? We have a bunch of various features and such, and, as I can reproduce the workload, but not the timings, locally, I suspect I might just be running different code?

matklad commented 3 years ago

Hm, that's interesting! The above timings are with lto disabled. If patiently wait until the single core links neard with full lto, I observe the following timing (with no_cache feature enabled):

Mar 10 16:21:56.483  INFO run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=1.46ms time.idle=430ns
Mar 10 16:21:56.483  INFO run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=4.00ms time.idle=270ns
Mar 10 16:21:56.483  INFO run_vm: near_vm_runner::runner: close time.busy=4.01ms time.idle=280ns

It seems that, with lto, execution does become reltively faster than serializaton. So it might be that we somehow disable the memeory cache in this test?

Where's the code/config which describes how we produce a build for the test in question?

birchmd commented 3 years ago

@matklad Deploying to mocknet is still a relatively manual process unfortunately (this will be addressed soon). I can tell you, for the run Bowen posted logs from, the binary was compiled from https://github.com/near/nearcore/commit/d5e0e0b4fd019814e860ffbaa466385ac14a698b using the command cargo build -p neard --release (without any changes to Cargo.toml or any extra environment variables).

birchmd commented 3 years ago

I am happy to run a test using a binary from any commit, with any features enabled. Just let me know!

matklad commented 3 years ago

@birchmd let's run https://github.com/near/nearcore/commit/abf71bb6102f12332b51586850432912b5eada90, it adds more logging to runtime to figure out where those non-call milliseconds went.

bowenwang1996 commented 3 years ago

So it might be that we somehow disable the memeory cache in this test?

Since we don't use any special feature flags when compiling neard, that doesn't seem very likely

bowenwang1996 commented 3 years ago

@matklad with abf71bb6102f12332b51586850432912b5eada90 we get the following:

Mar 10 17:39:44.378 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Calling the contract at account load_testing_3828    
Mar 10 17:39:44.378  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:compile_module_cached_wasmer: near_vm_runner::cache::wasmer0_cache: close time.busy=6.00µs time.idle=1.51µs
Mar 10 17:39:44.378  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/memory: near_vm_runner::wasmer_runner: close time.busy=22.8µs time.idle=963ns
Mar 10 17:39:44.381  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/instantiate: near_vm_runner::wasmer_runner: close time.busy=2.79ms time.idle=2.92µs
Mar 10 17:39:44.385  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=3.47ms time.idle=2.75µs
Mar 10 17:39:44.386  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.06ms time.idle=1.53µs
Mar 10 17:39:44.386  INFO Runtime::apply:Runtime::process_receipt:run_vm: near_vm_runner::runner: close time.busy=8.11ms time.idle=2.51µs
Mar 10 17:39:44.387 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Done.    
Mar 10 17:39:44.387  INFO Runtime::apply:Runtime::process_receipt: node_runtime: close time.busy=11.4ms time.idle=2.19µs

It seems that cache works fine and there is something else other than instantiate and call that takes 2-3ms

matklad commented 3 years ago

:thinking: well, lets log absolutely everything then, let's try https://github.com/near/nearcore/commit/044419a307e107f6f3836e5a22df7f6f54369dc4

Can I get the full logs for that one? I'd love to look around, in case anything else jumps as suspicious at me.

Local output for me looks like this.

ar 12 18:37:18.940 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Calling the contract at account test0    
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer:compile_module_cached_wasmer: near_vm_runner::cache::wasmer0_cache: close time.busy=1.32µs time.idle=380ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer: near_vm_runner::wasmer_runner: close time.busy=7.56µs time.idle=300ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/memory: near_vm_runner::wasmer_runner: close time.busy=2.70µs time.idle=280ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/VMLogic::new_with_protocol_version: near_vm_runner::wasmer_runner: close time.busy=410ns time.idle=270ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/logic.add_contract_compile_fee: near_vm_runner::wasmer_runner: close time.busy=250ns time.idle=250ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/imports::build_wasmer: near_vm_runner::wasmer_runner: close time.busy=8.12µs time.idle=310ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/check_method: near_vm_runner::wasmer_runner: close time.busy=300ns time.idle=310ns
Mar 12 18:37:18.940  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/instantiate: near_vm_runner::wasmer_runner: close time.busy=258µs time.idle=380ns
Mar 12 18:37:18.946  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=5.60ms time.idle=440ns
Mar 12 18:37:18.946  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_vm: near_vm_runner::wasmer_runner: close time.busy=127µs time.idle=1.34µs
Mar 12 18:37:18.946  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_rest: near_vm_runner::wasmer_runner: close time.busy=190ns time.idle=300ns
Mar 12 18:37:18.946  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=6.05ms time.idle=280ns
Mar 12 18:37:18.946  INFO Runtime::apply:Runtime::process_receipt:run_vm: near_vm_runner::runner: close time.busy=6.06ms time.idle=370ns
Mar 12 18:37:18.946 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Done.    
Mar 12 18:37:18.946  INFO Runtime::apply:Runtime::process_receipt: node_runtime: close time.busy=6.11ms time.idle=320ns

what also jumps as strange at me that I get much bigger diff between instantiate and call...

bowenwang1996 commented 3 years ago

@matklad I observed the following:

Mar 12 16:58:42.551 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Calling the contract at account load_testing_4386    
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer:compile_module_cached_wasmer: near_vm_runner::cache::wasmer0_cache: close time.busy=7.68µs time.idle=2.77µs
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer: near_vm_runner::wasmer_runner: close time.busy=91.9µs time.idle=1.48µs
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/memory: near_vm_runner::wasmer_runner: close time.busy=31.4µs time.idle=1.24µs
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/VMLogic::new_with_protocol_version: near_vm_runner::wasmer_runner: close time.busy=1.18µs time.idle=1.16µs
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/logic.add_contract_compile_fee: near_vm_runner::wasmer_runner: close time.busy=478ns time.idle=957ns
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/imports::build_wasmer: near_vm_runner::wasmer_runner: close time.busy=60.7µs time.idle=1.21µs
Mar 12 16:58:42.555  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/check_method: near_vm_runner::wasmer_runner: close time.busy=5.33µs time.idle=1.14µs
Mar 12 16:58:42.559  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/instantiate: near_vm_runner::wasmer_runner: close time.busy=4.31ms time.idle=3.80µs
Mar 12 16:58:42.563  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=3.93ms time.idle=2.97µs
Mar 12 16:58:42.566  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_vm: near_vm_runner::wasmer_runner: close time.busy=2.55ms time.idle=3.54µs
Mar 12 16:58:42.566  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_rest: near_vm_runner::wasmer_runner: close time.busy=465ns time.idle=2.15µs
Mar 12 16:58:42.566  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=11.3ms time.idle=1.98µs
Mar 12 16:58:42.566  INFO Runtime::apply:Runtime::process_receipt:run_vm: near_vm_runner::runner: close time.busy=11.3ms time.idle=4.03µs
Mar 12 16:58:42.567 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Done.    
Mar 12 16:58:42.567  INFO Runtime::apply:Runtime::process_receipt: node_runtime: close time.busy=16.3ms time.idle=1.62µs

There are several things that feel suspicious:

matklad commented 3 years ago

:thinking: no idea about what "Something after "Done" might be". VM instantiation/tear down seem to account for the discrepancy we are seening between my local runs and nayduck. olonho and my hypothesis is that this has to do with nayduck's hypervisor: wasmer stresses virtual memory subsystem of the machine, and that is noticiably more complicated in virtualized environments. Next steps here are:

bowenwang1996 commented 3 years ago

add more spans outside of the runtime

Do you mean outside of contract runtime? The following line is still logged within runtime

node_runtime: close time.busy=16.3ms time.idle=1.62µs
matklad commented 3 years ago

Do you mean outside of contract runtime? The following line is still logged within runtime

Outside of contract runtime (the run function) but inside the node runtime (the process_receipt function). Do we have a better term for contract runtime which doesn't include "runtime"? I am always confusing the two :(

bowenwang1996 commented 3 years ago

Do we have a better term for contract runtime which doesn't include "runtime"?

Not sure. Feel free to come up with some better name :)

matklad commented 3 years ago

run the test locally in a VM, to check the above hypothesis.

run locally via quemu, didn't reproduce, will try virtualbox.

bowenwang1996 commented 3 years ago

@matklad I kind of figured out what happened, although some parts still feel confusing to me. The issue is that when we investigated the memory usage of neard, we found that rocksdb cache size is too large and therefore reduced per column cache size from 170MB to 32MB and reverting that change (https://github.com/near/nearcore/commit/cc0118a157556dcdc827174e6ca14a0d69b6df7f) made the test pass. I believe overall the reason is that with 170MB cache size, we actually are able to cache everything in memory (i.e, almost no reads in this test actually touches disk) and most operations are therefore fast. However, one thing that I don't quite understand is that this also makes run_wasmer much faster

Mar 17 19:58:56.746 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Calling the contract at account load_testing_3014    
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer:compile_module_cached_wasmer: near_vm_runner::cache::wasmer0_cache: close time.busy=5.33µs time.idle=766ns
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer: near_vm_runner::wasmer_runner: close time.busy=27.3µs time.idle=533ns
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/memory: near_vm_runner::wasmer_runner: close time.busy=11.8µs time.idle=553ns
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/VMLogic::new_with_protocol_version: near_vm_runner::wasmer_runner: close time.busy=650ns time.idle=498ns
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/logic.add_contract_compile_fee: near_vm_runner::wasmer_runner: close time.busy=416ns time.idle=508ns
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/imports::build_wasmer: near_vm_runner::wasmer_runner: close time.busy=24.1µs time.idle=502ns
Mar 17 19:58:56.746  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/check_method: near_vm_runner::wasmer_runner: close time.busy=1.03µs time.idle=528ns
Mar 17 19:58:56.748  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/instantiate: near_vm_runner::wasmer_runner: close time.busy=1.86ms time.idle=860ns
Mar 17 19:58:56.752  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=3.86ms time.idle=1.20µs
Mar 17 19:58:56.753  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_vm: near_vm_runner::wasmer_runner: close time.busy=1.19ms time.idle=1.30µs
Mar 17 19:58:56.754  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_rest: near_vm_runner::wasmer_runner: close time.busy=386ns time.idle=936ns
Mar 17 19:58:56.754  INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=7.14ms time.idle=699ns
Mar 17 19:58:56.754  INFO Runtime::apply:Runtime::process_receipt:run_vm: near_vm_runner::runner: close time.busy=7.15ms time.idle=979ns
Mar 17 19:58:56.754 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Done.    
Mar 17 19:58:56.754  INFO Runtime::apply:Runtime::process_receipt: node_runtime: close time.busy=8.11ms time.idle=1.23µs

Both drop_vm and instantiate become faster and I am not sure why that is the case.

matklad commented 3 years ago

Tried running on gcloud e2-standard-4 instance, and I still see performance profile close to one on my local machine, and different from the one from nayduck.

logs ``` Mar 19 16:22:27.230 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Calling the contract at account test0 Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer:compile_module_cached_wasmer: near_vm_runner::cache::wasmer0_cache: close time.busy=6.77µs time.idle=908ns Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/compile_module_cached_wasmer: near_vm_runner::wasmer_runner: close time.busy=26.2µs time.idle=816ns Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/memory: near_vm_runner::wasmer_runner: close time.busy=13.1µs time.idle=937ns Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/VMLogic::new_with_protocol_version: near_vm_runner::wasmer_runner: close time.busy=2.42µs time.idle=659ns Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/logic.add_contract_compile_fee: near_vm_runner::wasmer_runner: close time.busy=743ns time.idle=619ns Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/imports::build_wasmer: near_vm_runner::wasmer_runner: close time.busy=24.2µs time.idle=676ns Mar 19 16:22:27.230 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/check_method: near_vm_runner::wasmer_runner: close time.busy=1.35µs time.idle=676ns Mar 19 16:22:27.231 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/instantiate: near_vm_runner::wasmer_runner: close time.busy=1.16ms time.idle=1.71µs Mar 19 16:22:27.242 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/call: near_vm_runner::wasmer_runner: close time.busy=10.4ms time.idle=2.36µs Mar 19 16:22:27.242 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_vm: near_vm_runner::wasmer_runner: close time.busy=485µs time.idle=2.39µs Mar 19 16:22:27.242 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer:run_wasmer/drop_rest: near_vm_runner::wasmer_runner: close time.busy=562ns time.idle=1.18µs Mar 19 16:22:27.242 INFO Runtime::apply:Runtime::process_receipt:run_vm:run_wasmer: near_vm_runner::wasmer_runner: close time.busy=12.3ms time.idle=796ns Mar 19 16:22:27.242 INFO Runtime::apply:Runtime::process_receipt:run_vm: near_vm_runner::runner: close time.busy=12.3ms time.idle=1.16µs Mar 19 16:22:27.242 DEBUG Runtime::apply:Runtime::process_receipt: runtime: Done. Mar 19 16:22:27.242 INFO Runtime::apply:Runtime::process_receipt: node_runtime: close time.busy=12.5ms time.idle=1.05µs ```

This makes me think that maybe the difference is not in the machine, but in the way I simulate the workload.

bowenwang1996 commented 3 years ago

@matklad I see. Likely you need to have the same setup as the mocknet loadtest. There the state is much larger and transaction load is much higher, which could contribute to the difference in performance.

bowenwang1996 commented 3 years ago

Discussed with @matklad today about this issue and we plan to proceed as follows:

cc @posvyatokum

posvyatokum commented 3 years ago

@matklad I've added rust load test in this branch https://github.com/near/nearcore/compare/local-load-test I run it from nearcore dir and write stderr to log.txt (in the end of this test this file is analysed).

I've tried to keep it as close to mocknet load test as possible, and in this form test find blocks that take long time to produce, but I also suggest to investigate another configuration:

  1. Comment first 10 minutes of transfers (for speed reasons)
  2. In random transactions part send only bubble sort function calls
  3. Add assert on block production time in loop for sending random transactions (again, to save time) In this case the test will almost always fail on the block at the same hight (which is ~1550) and it is very suspicious.

I will investigate the strange consequences of nonce bug that I had in this test. Basically, I sent transactions with repeated nonces and this somehow lead to runtime::apply function to run almost 1s (instead of 50ms)

janewang commented 3 years ago

@matklad We reviewed this issue in the team meeting today and would like to assign this to you for next step. Please check out the work from @posvyatokum and reach out to the NodeX team (@bowenwang1996 @posvyatokum and @janewang) if you have any follow up questions! :)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

bowenwang1996 commented 3 years ago

@matklad @posvyatokum could you remind me whether we made more progress on this issue?

matklad commented 3 years ago

No, no progress on my part :(

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.