Performance of `ServiceResources` in request handling

sosthene-nitrokey commented 10 months ago

When an app has backends configured that deal with core requests, they will "see" each core API request.

But most backends get all the "resources" from ServiceResources, but loading the keystore necessitates forking the RNG (because the keystore needs an RNG for KeyId generation. The same is true for the cerstore and the counterstore.

However many requests will never touch any of these stores and those that do will probably only ever touch one.

Since generating RNG is expensive (on embedded harware), if each intermediary backend that sees the requests forks the RNG that way it might add up to a significant overhead. For example in opcard, which has the RSA backend enabled, each requests to the core API ends up forking the RNG at least 5 times:

And then some requests fork it once more through self.rng.

This seems unnecessary and could be one of the reason of the slowness of opcard.

I see two way to reduce the number of forking of the RNG:

Avoid using self.rng() (I'm guilty of this in backends) and instead use the already-forked RNG from the keystore like the crypto mechanisms do
Use lazily intialize the keystore, certstore and counterstore only for requests that need it.

Ideally we could also totally avoid forking the RNG for requests that only need to read data from the keystore and don't need to generate KeyIds.

sosthene-nitrokey commented 10 months ago

We could also store the stores initialized stores in the ClientContext, including the forked RNG. This would also make the accessible to other backends without having to copy the path again.

robin-nitrokey commented 10 months ago

Use lazily intialize the keystore, certstore and counterstore only for requests that need it.

Generally, I’m not a big fan of the huge reply_to function. It also introduces a significant stack burden. I would split it into one function per (non-trivial) API call. Then we could also move the store instantiation to those functions that really need it.

This would also make the accessible to other backends without having to copy the path again.

Do we even need owned stores? Couldn’t we just use references?

sosthene-nitrokey commented 10 months ago

In the backends I always take a mutable reference to the stores.

Implementing lazy initialization in the se050 backend and trussed saved 30ms of opgpcard status, only a 2% improvement. I expected more.

Not sure whether it's worth it.

It also introduces a significant stack burden

I don't understand why that is the case. Should the compiler be able to reuse the same stack space across incompatible branches?

sosthene-nitrokey commented 10 months ago

This would also make the accessible to other backends without having to copy the path again.

Do we even need owned stores? Couldn’t we just use references?

I am not sure what you mean by that. The stores are already &mut everywhere in trussed.

sosthene-nitrokey commented 10 months ago

Example implementation for lazily creating the keystore: https://github.com/trussed-dev/trussed-rsa-backend/commit/3fbd2be87bffea4927bfafee7c1f0a80e7d904b2 and https://github.com/trussed-dev/trussed/commit/99c2700223fb626295d955e3d282182af010b06e

robin-nitrokey commented 10 months ago

I don't understand why that is the case. Should the compiler be able to reuse the same stack space across incompatible branches?

You would think so, but apparently it is not the case. On the lpc55, the stack size is ca. 20 kB.

I am not sure what you mean by that. The stores are already &mut everywhere in trussed.

I mean to replace ClientKeystore { path: PathBuf } with ClientKeystore<'a> { path: &'a Path }, in case you worry about the path copies.

sosthene-nitrokey commented 10 months ago

I believe that would work, but I'm not sure it's the most costly part.

It would also be incompatible with the idea to store them in the ClientContex.

trussed-dev / trussed

Performance of `ServiceResources` in request handling #136