matklad / once_cell

Rust library for single assignment cells and lazy statics without macros
Apache License 2.0
1.87k stars 109 forks source link

Documentation is misleading w.r.t. guarantees provided regarding side effects #83

Closed briansmith closed 4 years ago

briansmith commented 4 years ago

The documentation says that the "Implementation is based on lazy_static and lazy_cell crates and std::sync::Once. In some sense, once_cell just streamlines and unifies those APIs To implement a sync flavor of OnceCell, this crates uses either a custom re-implementation of std::sync::Once or parking_lot::Mutex."

This creates some confusion about how equivalent once_cell::OnceCell is to std::sync::Once. In particular, std::sync::Once guarantees that "When this function returns, it is guaranteed that some initialization has run and completed (it may not be the closure specified). It is also guaranteed that any memory writes performed by the executed closure can be reliably observed by other threads at this point (there is a happens-before relation between the closure and code executing after the return)."

Is this guarantee provided by OnceCell? I don't think you intend it to be provided, but the statements above in the documentation might lead one to think that OnceCell is a reimplementation of std::sync::Once with a different (better) API, when really it implements a similar but more limited (and potentially more efficient) variant.

matklad commented 4 years ago

I’ll clarify this in the docs, but OnceCell does establish happens before relationship: it wouldn’t be sound otherwise

On Thursday, 2 January 2020, Brian Smith notifications@github.com wrote:

The documentation says that the "Implementation is based on lazy_static and lazy_cell crates and std::sync::Once. In some sense, once_cell just streamlines and unifies those APIs To implement a sync flavor of OnceCell, this crates uses either a custom re-implementation of std::sync::Once or parking_lot::Mutex."

This creates some confusion about how equivalent once_cell::OnceCell is to std::sync::Once. In particular, std::sync::Once guarantees that "When this function returns, it is guaranteed that some initialization has run and completed (it may not be the closure specified). It is also guaranteed that any memory writes performed by the executed closure can be reliably observed by other threads at this point (there is a happens-before relation between the closure and code executing after the return)."

Is this guarantee provided by OnceCell? I don't think you intend it to be provided, but the statements above in the documentation might lead one to think that OnceCell is a reimplementation of std::sync::Once with a different (better) API, when really it implements a similar but more limited (and potentially more efficient) variant.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/matklad/once_cell/issues/83?email_source=notifications&email_token=AANB3M7FOVJ7UJLUVJGQU3DQ3ZTGDA5CNFSM4KCHL352YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IDYTJXQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANB3MZUKDX7FTPZGEMESM3Q3ZTGDANCNFSM4KCHL35Q .

matklad commented 4 years ago

Thanks for the report! It was a major oversight in the documentation, as these grantees are really crucial. And of course I am completely wrong claiming that " it wouldn’t be sound otherwise", and you are right that there's a potentially more efficient implementation via consume is possible. This definitely needs to be added to the RFC.

However, for this crate I want to stick to a conservative Acquire choice, for two reasons:

That said, I think that for std we maybe should spec that the ordering is consume: I would feel better about faking consume via relaxed in std, b/c it is bound to the specific compiler and can rely on implementaiton defined behavior.

I also think that we could add a separate ConsumeOnceCell, so that the user can opt-out of the additional guarantee, if they really need the best possible performance.

briansmith commented 4 years ago

I’ll clarify this in the docs, but OnceCell does establish happens before relationship: it wouldn’t be sound otherwise

Just to clarify: The issue isn't about the ordering guarantees for values stored within the OnceCell, but rather the ordering guarantees for otherwise-uncontrolled writes that the closure performs (via unsafe code) to memory outside the OnceCell.

briansmith commented 4 years ago

I think OnceCell should only guarantee that the contents of the cell are synchronized, and we should leave side effect synchronization, when needed, to std::sync::Once and any #![no_std]-compatible replacements for it. In most cases where OnceCell is used, one would want the maximum performance and not have any need for side-effects.

After you ship a release with #85 it will be pretty much impossible to undo that decision without renaming OnceCell and/or the crate.

matklad commented 4 years ago

The problem is that we can't actually take advantage of this weaker guarantee in current Rust, without involving implementation-defined behavior.

Rust does not provide Ordering::Consume (which we need to make the code run faster), and, while one can simulate it in sketchy way with Relaxed and a compiler barrier, this is not guaranteed to work (though it does work in practice). Crossbeam docs sum up the situation nicely:

The exact definition of "depend on" is a bit vague, but it works as you would expect in practice since a lot of software, especially the Linux kernel, rely on this behavior.

https://docs.rs/crossbeam-utils/0.7.0/crossbeam_utils/atomic/trait.AtomicConsume.html#tymethod.load_consume

I don't want to ship a vaguely correct synchronization primitive as a default in a foundation-level library.

We can of course document weaker guarantees but implement stronger ones, in hope that we'll improve implementation once it becomes possible to do in a language-lawer correct way. However, Consume is not on the horizon, it's even put on hold in C++, as it was specked wrong.

A plan outlined in https://github.com/matklad/once_cell/pull/71#issuecomment-570551203 seems better, in practice. We ship a vaguely correct, but more efficient implementation as an opt-in module once_cell::consume::OnceCell. That way, client who care about performance more than about pedantic correctness, could opt-in into extra speed now (and also give valuable feedback about how much it is faster in practice, and if it is correct enough in practice). The suggest default however remains more conservative, so I can sleep well. When/if Consume becomes an officially supported thing, we issue the 2.0 version of the library, which relaxed guarantee.

For standard library, we can't add a separate feature-flaged module or issue 2.0, so we should stick to a more conservative default. At the same time, standard library can rely on implementation-defined behavior (b/c it is paired with the compiler version), so we would actually be able to ship a more performant implementation in std before sorting out consume formally. That is, the risk I see with using crossbeam's hack right now is that a compiler upgrade may, in theory, break the code. And, in theory, people might be compiling once_cell 1.0.1 with rustc 1.666 some time in the future. With std, this is not a problem, because, if compiler gets smarter we can fix std.

briansmith commented 4 years ago

We can of course document weaker guarantees but implement stronger ones, in hope that we'll improve implementation once it becomes possible to do in a language-lawer correct way. However, Consume is not on the horizon, it's even put on hold in C++, as it was specked wrong.

That's my preference. However, I think the choice you've chosen also makes sense.

I agree it is risky, too risky, to implement this with a hack.