entropy: Clarify API contract

pfalcon commented 6 years ago

Currently, entropy subsystem interface description is pretty bare: https://github.com/zephyrproject-rtos/zephyr/blob/master/include/entropy.h#L45

Following issues are worth being specified explicitly:

Many hardware "true RNG" generators have limited bandwidth, so it should be explicitly specified that calling entropy_get_entropy() may block, in the sense that calling thread may be put inti sleep, and another thread rescheduled instead.
It's worth to clarify re-entrancy of this API, specifically whether it's safe to call entropy_get_entropy() from multiple threads as is, or not, i.e. only one thread can call it, and multiple threads should use external mutex for synchronization.

Note that to guarantee how-quality entropy generation, we probably need to require thread-safety support, because otherwise it would be too easy to get low-quality entropy, which is a security issue.

pfalcon commented 6 years ago

These concerns come from usage of entropy source in mbedTLS: https://github.com/zephyrproject-rtos/zephyr/issues/6132#issuecomment-365353241

Related, but different matter with entropy drivers happened to be discussed at the same time, and tracked as https://github.com/zephyrproject-rtos/zephyr/issues/6184

carlescufi commented 6 years ago

This should now be clearer with the introduction of entropy_get_entropy_isr() in #6294

carlescufi commented 6 years ago

By the way, I think the reentrancy/concurrency should be definitely addressed at a global level for all driver APIs. We need a generic "contract" for driver APIs which, IMHO, should include support for multiple threads accessing concurrently.

pfalcon commented 6 years ago

We need a generic "contract" for driver APIs which, IMHO, should include support for multiple threads accessing concurrently.

It would be easy to agree, but that would mean to literally wrap every driver-exported function into a mutex. That adds noticeable overhead, and may be not very efficient overall. E.g. consider a usecase I have with mbedTLS: its own subsystems don't use entropy source directly. Instead, they use crypto-secure PRNG, and that's being re-seed from time to time from entropy source. PRNG itself is not re-enterable, and that's what needs to be protected with mutex. But then, from mbedTLS PoV, another mutex on entropy is pure overhead.

That's very partial PoV of course, because completely different libs/subsystems may access entropy besides mbedTLS. And entropy is very peculiar service, which not being used correctly, can lead to security issues. So, it's a good candidate for going "correctness in all cases over optimality" approach.

But I'm not sure this approach is universal. Maybe in other cases it's better to let apps decide themselves what to do (even if they can mis-decide to shoot themselves in the feet).

It's a complex matter with different tradeoffs, and I'd suggest we approach it on a case by cases basis, and based on accumulated experience with previous cases.

carlescufi commented 6 years ago

A good example of this contract being unclear: https://github.com/zephyrproject-rtos/zephyr/blob/09ce2e218fb34ec0ad182e3d378614257fdb22e6/subsys/random/rand32_entropy_device.c#L34

carlescufi commented 6 years ago

CC @lpereira @tbursztyka @anangl

lpereira commented 6 years ago

Yeah, the entropy API needs to be revisited.

In particular, I would prefer that entropy_get_entropy() had a timeout parameter like the rest of Zephyr APIs that can block. It would make it possible to use it from ISRs as well, without introducing yet another function to the API (e.g. by passing K_NO_WAIT as the timeout and handling the error).

pfalcon commented 6 years ago

In particular, I would prefer that entropy_get_entropy() had a timeout parameter like the rest of Zephyr APIs that can block.

And here we go (what I wrote in my previous comment). Well, it's OK to do that for entropy_get_entropy() - it's a special, security-related API. But affixing all driver API calls with timeout parameters, because potentially almost anything can block (as we know from Linux, re: e.g. "GPIOs which can sleep") is unlikely a good solution for Zephyr ;-).

carlescufi commented 6 years ago

@andrewboie @lpereira @pfalcon Since this is an issue for booting up the system when stack randomization is enabled, the proposed API would be:

entropy_get_entropy_isr(struct device *device, u8_t *buf, u16_t len, int flags)

Where flags could specify either NONBLOCK, BUSYWAIT

andrewboie commented 6 years ago

@carlescufi works for me. We can then update z_early_boot_rand32_get() to use this API instead. However if BUSYWAIT depends on external interrupts to get more entropy there might be an issue, IRQs are all masked until we get to the POST_KERNEL phase.

andrewboie commented 5 years ago

@carlescufi is this still something that needs enhancement? at least with respect to memory protection?

kartben commented 10 months ago

@carlescufi @ceolin is this ticket still relevant?

ceolin commented 2 months ago

@kartben I think it is. To be honest that I hadn't seen it before.

While the situation now is much better, re-entrancy is a valid problem. I have worked it out in the random subsystem but I noticed that there are a few places using entropy drivers directly. I will review these use cases but in my mind, everyone should be using the random subsystem and not these drivers.

zephyrproject-rtos / zephyr

entropy: Clarify API contract #6187