rust-lang / libs-team

The home of the library team
Apache License 2.0
116 stars 18 forks source link

AtomicBool layout changes to become amo biased rather than bool biased #249

Closed SUPERCILEX closed 1 year ago

SUPERCILEX commented 1 year ago

Proposal

Problem statement

See Zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/AtomicBool.20useless.20on.20riscv.

Gist: AtomicBool uses a u8 under the hood, but not all hardware supports 8/16 bit atomics even if it support 32 bit atomics. Change the memory layout of bool to match the hardware's amo type size.

Solution sketch

+ pub type AtomicBoolType = cfg(smallest_type_supported_by_hardware)

- fn from_ptr(ptr: *mut bool) -> &AtomicBool
+ fn from_ptr(ptr: *mut AtomicBoolType) -> &AtomicBool

- fn from_mut(v: &mut bool) -> &mut AtomicBool
+ fn from_mut(v: &mut AtomicBoolType) -> &mut AtomicBool

- fn get_mut_slice(this: &mut [AtomicBool]) -> &mut [bool]
+ fn get_mut_slice(this: &mut [AtomicBool]) -> &mut [AtomicBoolType]

- fn from_mut_slice(v: &mut [bool]) -> &mut [AtomicBool]
+ fn from_mut_slice(v: &mut [AtomicBoolType]) -> &mut [AtomicBool]

Downsides

People that are relying to AtomicBool being layout compatible with bool are going to be sad.

Upside

We can be maximally efficient on all hardware while still letting people do weird transmutes since they have AtomicBoolType.

SUPERCILEX commented 1 year ago

I need to think about whether or not AtomicBoolType should always be a u* or if it can be a bool on the platforms that support u8s.

BurntSushi commented 1 year ago

I don't think we can/should do this. The docs on AtomicBool are crystal clear:

This type has the same in-memory representation as a bool.

Changing that seems like a breaking change of the worst kind. I realize the actual impact of this will be limited to some rather niche platforms, but we've promised it will have the same layout as bool in very clear terms.

How common of a need is it to be maximally efficient here on platforms where our current AtomicBool is inappropriate? If it's rare, inline assembly seems like an appropriate solution to me. But I'm not 100% certain.

workingjubilee commented 1 year ago

That change was a regression from using a usize, the costs of which were not discussed, and previously it was also a breaking change (de facto if not de jure).

BurntSushi commented 1 year ago

@workingjubilee Is that a rebuttal? An FYI? Can you elaborate please?

workingjubilee commented 1 year ago

Mostly just a remark, I guess?

The flipside of this is that RISCV itself has a de facto (if not de jure) encoding for byte-level atomic instructions, and it's not clear what the performance characteristics of most RISCV architectures with high core counts and full implementation of the atomic instruction set will be, because... most RISCV architectures aren't printed yet.

thomcc commented 1 year ago

Yeah, I think I'm inclined to agree with @BurntSushi -- I don't think we should make a habit of considering concrete promises we make about type layout to be things we can go back on.

the8472 commented 1 year ago

It's not even obvious if that's a win. Making the type larger can blow up the size of structs containing them. It's a size vs. instruction-overhead tradeoff.

SUPERCILEX commented 1 year ago

It's a size vs. instruction-overhead tradeoff.

What? It's the difference between looping load-reserve/store-conditional instructions and an amo instruction. I don't have an arm or riscv machine to benchmark things, but I would be very surprised if those had comparable performance under contention.

the8472 commented 1 year ago

That's the overhead part. Smaller structs, more expensive instructions.

SUPERCILEX commented 1 year ago

The comparison you're making is bizarre. In what scenario do people have millions of atomic booleans laying around?

programmerjake commented 1 year ago

how about a lock-free hash table of some sort?

programmerjake commented 1 year ago

or Java-style mutex-per-object where you want them to be 1 byte to save space

SUPERCILEX commented 1 year ago

Both of those would be questionable implementations without some way to wait (consider the case where the OS context switches in the middle of the critical section) which on linux requires a 32-bit atomic.

But if people are against changing the layout of AtomicBool, then I can switch this proposal to AtomicFlag which would look just like the AtomicBool I proposed.

thomcc commented 1 year ago

Both of those would be questionable implementations without some way to wait (consider the case where the OS context switches in the middle of the critical section) which on linux requires a 32-bit atomic.

Are you suggesting changing AtomicBool to be 32 bits everywhere then?

But if people are against changing the layout of AtomicBool, then I can switch this proposal to AtomicFlag which would look just like the AtomicBool I proposed.

I think this would be more reasonable, although I'm not sure it's worth having, given the additional complexity it adds.

SUPERCILEX commented 1 year ago

Are you suggesting changing AtomicBool to be 32 bits everywhere then?

That doesn't sound too crazy to me, but no. I think there are still many use cases where you just want to know if something is active. Then again, AtomicFlag would be simpler if it was always a u32, but it's easy to imagine regretting that later when some architecture decides it only supports 42 bit atomics or something.

I think this would be more reasonable, although I'm not sure it's worth having, given the additional complexity it adds.

Yeah, I think that's probably what this issue should be about. I personally don't think having a memory layout compatible type and an amo compatible type is too much added complexity.

thomcc commented 1 year ago

The fact that we also need to create another type that only allows 0 and 1 for this is a more significant issue in terms of complexity, I think. It will need its own API and conversions, for example.

SUPERCILEX commented 1 year ago

Ok, the discussion on Zulip concluded that https://github.com/rust-lang/rust/pull/114034 and https://github.com/llvm/llvm-project/issues/64090 were satisfactory solutions.