qdrvm / kagome

Kagome - C++20 implementation of Polkadot Host
https://kagome.readthedocs.io
Apache License 2.0
154 stars 33 forks source link

[Bug]: Slow PVF execution in separate process #2123

Open kamilsa opened 3 weeks ago

kamilsa commented 3 weeks ago

Bug Summary

Issue occurs when --parachain-single-process flag is not provided

Bug Description

It is observed that when --parachain-single-process flag is not provided PVF execution time is drastically slower.

Steps to Reproduce

During the execution of zombienet/polkadot/functional/0001-parachains-pvf.zndsl test with --parachain-single-process flag added there were the following execution times:

24.06.13 09:46:42.464610  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 337 ms
24.06.13 09:46:48.435971  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 273 ms
24.06.13 09:46:55.022460  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 294 ms
24.06.13 09:46:55.213458  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 484 ms
24.06.13 09:47:00.392891  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 247 ms
24.06.13 09:47:06.451197  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 307 ms
24.06.13 09:47:12.705043  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 191 ms
24.06.13 09:47:18.417648  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 245 ms
24.06.13 09:47:18.556004  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 136 ms
24.06.13 09:47:24.388348  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 240 ms
24.06.13 09:47:30.315966  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 170 ms
24.06.13 09:47:36.770884  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 163 ms
24.06.13 09:47:37.136053  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 518 ms
24.06.13 09:47:42.344387  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 213 ms
24.06.13 09:47:48.371995  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 272 ms
24.06.13 09:47:54.618380  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 476 ms
24.06.13 09:48:00.793122  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 543 ms
24.06.13 09:48:06.826347  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 536 ms
24.06.13 09:48:12.726246  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 600 ms
24.06.13 09:48:18.800589  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 483 ms
24.06.13 09:48:25.003346  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 219 ms
24.06.13 09:48:30.351778  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 154 ms
24.06.13 09:48:30.518747  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 320 ms
24.06.13 09:48:36.903490  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 711 ms
24.06.13 09:48:36.914686  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 722 ms
24.06.13 09:48:42.652574  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 524 ms
24.06.13 09:48:48.459002  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 244 ms
24.06.13 09:48:48.635134  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 418 ms
24.06.13 09:48:54.394874  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 275 ms
24.06.13 09:49:00.660355  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 558 ms
24.06.13 09:49:06.572422  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 439 ms
24.06.13 09:49:12.412997  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 189 ms
24.06.13 09:49:18.581708  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 154 ms
24.06.13 09:49:24.323560  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 154 ms
24.06.13 09:49:30.595910  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 474 ms
24.06.13 09:49:36.444391  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 227 ms
24.06.13 09:49:42.928065  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 323 ms
24.06.13 09:49:48.442094  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 212 ms
24.06.13 09:49:48.569911  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 339 ms
24.06.13 09:49:54.500129  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 359 ms

When flag was added execution times were:

24.06.13 09:38:18.232196  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 88 ms
24.06.13 09:38:24.200098  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 77 ms
24.06.13 09:38:30.220802  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 72 ms
24.06.13 09:38:36.369870  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 189 ms
24.06.13 09:38:42.479482  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 174 ms
24.06.13 09:38:48.217579  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 78 ms
24.06.13 09:38:54.638906  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 262 ms
24.06.13 09:38:54.712570  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 259 ms
24.06.13 09:38:54.785425  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 240 ms
24.06.13 09:39:00.226141  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 79 ms
24.06.13 09:39:06.467185  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 118 ms
24.06.13 09:39:06.539927  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 164 ms
24.06.13 09:39:12.342827  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 121 ms
24.06.13 09:39:18.243696  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 113 ms
24.06.13 09:39:24.117754  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 12 ms
24.06.13 09:39:24.132090  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 23 ms
24.06.13 09:39:30.197435  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 49 ms
24.06.13 09:39:36.134300  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 17 ms
24.06.13 09:39:42.243617  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 30 ms
24.06.13 09:39:42.256829  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 43 ms
24.06.13 09:39:42.280160  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 67 ms
24.06.13 09:39:48.557186  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 283 ms
24.06.13 09:39:48.567710  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 292 ms
24.06.13 09:39:48.577632  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 300 ms
24.06.13 09:39:54.082519  main_runner      Info      ParachainProcessorImpl  Parachain validation lasted for 20 ms

Effects of the Bug

Slow PVF execution times

Expected Behavior

No response

System Information

Tested on Apple M1 Pro, 16GB RAM

Additional Context

No response

turuslan commented 2 weeks ago
Apple M1, 16GB type file size exec time
Debug 495MB 200ms
RelWithDebInfo 185MB 20ms

exec on debug is slow (maybe due to executable size or global initializers). fork is fast (0.5ms). node process may fork+exec initial worker process, which will fork itself to "work" and stop these processes on timeout.

kagome
└ initial worker (slow start, starts and stops child workers)
  ├ worker 1 (fast start, does "work", stopped by parent on timeout)
  └ worker 2
turuslan commented 2 weeks ago

release polkadot-*-worker exe is 15MB (release wasmtime+cranelift is 5MB). cpp wasmedge (with static llvm) is 120MB debug and 100MB release.