openssl / project

Tracking of project related issues
2 stars 1 forks source link

Infallible Locking / Atomics #103

Open arapov opened 1 year ago

arapov commented 1 year ago

? do we want a public API for this [otc question]

nhorman commented 5 months ago

@vdukhovni could you please create a task list and proposal for how we might implement infallible locking apis that we can debate in a future backlog refinement meeting

t8m commented 5 months ago

Hugo's presentation: https://docs.google.com/presentation/d/1wwcvQFVDWFC1p9BNYUwXUG1GeB_TCH0KSg18Qs5ljWI/edit

hlandau commented 5 months ago

The above presentation is on locking. There is another presentation on atomics: https://docs.google.com/presentation/d/1vS0ZAMwz0HgMbxTFG8uhZIA9UwoRebOpZ6q430_A5EU/edit#slide=id.p

Note that we now already have infallible locking in the form of CRYPTO_MUTEX internally.

New APIs definitely should not be public.

The next steps here IMO are to have an internal atomics API (probably inlines) in a similar vein which are infallible operations which don't rely on having a lock allocated, i.e., uint64_t atomic_load(const uint64_t *p);

The hardest part here is supporting all the compiler-specific atomics APIs. Arguably though our platform list is not that long to make it infeasible. clang/gcc and MSVC code paths would be good enough for now. We could perhaps have a global mutex-based fallback implementation[^1] (for platforms with unknown atomics intrinsics and builds with threading enabled) or a dummy fallback implementation which just does normal loads/stores (for builds with threading disabled).

Subsequent to that the public atomics API could be refactored to use this internal API (and ideally deprecated, if possible), but it doesn't need to be done in the same PR.

It would be nice to see someone work on this.

@vdukhovni Feel free to ping me on this.

[^1]: I have a vague recollection that a strategy like this may be found as a fallback path in some C++ standard/runtime library implementations, but don't quote me on that. If it gets to be a performance problem the desirable solution is just to implement atomics on the desired platform. Failing that you can maintain a set of global mutex buckets and use bits of the pointer to an atomic variable to index psuedorandomly into one of the buckets to reduce contention.