paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.76k stars 631 forks source link

Offload PVF code decompression to a separate task on the blocking pool #5071

Closed s0me0ne-unkn0wn closed 4 weeks ago

s0me0ne-unkn0wn commented 1 month ago

In #3122, we moved the whole candidate validation subsystem to the blocking pool, as it performs PVF code decompression, which is a blocking task, although a small one. That wasn't the perfect solution, but in the context of problem space, as it looked like at that point, that was acceptable. Now, with agile coretime and #5012 around the corner, we must be ready to prepare a lot more PVFs than before, and the problem with blocking the candidate validation with decompressions is much more concerning.

We should offload that work to a separate task to allow candidate validation subsys to await on it asynchronously.

s0me0ne-unkn0wn commented 1 month ago

@sandreim @alexggh @alindima I need a bit of advice here.

The idiomatic approach to run a lot of CPU-bound tasks is to do that on rayon threads (event tokio docs tell to use rayon in this case). But considering how small every task is and that we're unlikely to spawn 100s of them at the same time, I'm more inclined just to use tokio::spawn_blocking for them. WDYT, do you have some insights?

bkchr commented 1 month ago

We have the Spawner that provides a spawn_blocking. I don't see any reason why you should use rayon here.

sandreim commented 1 month ago

As @bkchr says, just spawn a blocking task. I did the same in availability recovery, you can use it as an example.

s0me0ne-unkn0wn commented 1 month ago

Okay, I went through the code once again, and it doesn't make much sense to me :melting_face: Candidate validation subsys decompresses the PVF and the PoV just to observe their sizes in metrics. It doesn't use them in any other way and passes the decompressed blobs to the PVF host, which passes them to workers that are synchronous by their nature and are a totally natural place to do the decompression job.

To my mind, it should be refactored. The PVF decompression should be performed by the preparation worker, and the PoV decompression should go to the execution worker. The PVF host should return observable decompressed sizes to the candidate validation subsys (at the end of the day, it doesn't matter when we observe the sizes, before the validation or after it).

As I see it, the only drawback of this approach is that decompression times will be included in the preparation and execution timeouts. But they should be negligible anyway? Do we maybe have benchmarks of how fast the zstd decompression is on the reference hardware?

Any objections to this approach?

sandreim commented 1 month ago

Okay, I went through the code once again, and it doesn't make much sense to me 🫠 Candidate validation subsys decompresses the PVF and the PoV just to observe their sizes in metrics. It doesn't use them in any other way and passes the decompressed blobs to the PVF host, which passes them to workers that are synchronous by their nature and are a totally natural place to do the decompression job.

Not sure why you think it is not used, the code shows that the decompressed blob is being passed in ValidationParams. https://github.com/paritytech/polkadot-sdk/blob/f13ed8de69bcfcccecf208211998b8af2ef882a2/polkadot/node/core/candidate-validation/src/lib.rs#L655

s0me0ne-unkn0wn commented 1 month ago

the code shows that the decompressed blob is being passed in ValidationParams

Exactly, but the ValidationParams themselves are not used for anything other than passing them to the PVF execution worker to provide a PVF with parameters. Instead, we could pass persisted validation data and compressed PoV there and let the execution worker decompress PoV and construct ValidationParams itself.

My point is that everything that is currently done with the compressed data in the candidate validation subsys may be done in PVF preparation/execution workers as well, and nothing will change. Candidate validation subsys doesn't need that data decompressed for itself, it decompresses them for the PVF host.

sandreim commented 1 month ago

I see, this is a nice micro optimization. The impact of it depends on the number of pending validation requests in the queue and how long decompression takes. I think next step should be to measure and see if decompression significantly stalling the candidate validation loop.

AndreiEres commented 1 month ago

After we merged https://github.com/paritytech/polkadot-sdk/pull/4791, we decompress the same PVF one more time when we prepare the node for the active set, which looks unoptimal. This can be solved with a slight refactoring of the message arguments. But does that mean we no longer need the subsystem in the blocking pool?

s0me0ne-unkn0wn commented 1 month ago

I'm currently working on a PR that moves all the decompression to PVF host workers, stay tuned :)