probe-rs / rtt-target

Target side implementation of the RTT (Real-Time Transfer) I/O protocol
MIT License
118 stars 29 forks source link

do not rely on `VolatileCell` for memory safety or to prevent reordering of memory operations #4

Open japaric opened 4 years ago

japaric commented 4 years ago

the implementation is using volatile operations to prevent re-ordering between memory operations but as used this is wrong for two different reasons

  1. volatile operations are not reordered (by the compiler) wrt to other volatile operations, but the compiler is free to move non-volatile operations around

For example, RttHeader::init contains this code:

ptr::write_volatile(&mut self.max_down_channels, max_down_channels);
ptr::copy_nonoverlapping(b"SEGG_" as *const u8, self.id.as_mut_ptr(), 5);
// ..

the second ptr call is a non-volatile operation so the compiler is free to move that operation before the first volatile ptr call. Whether you observe that reordering today or not is an implementation detail but any LLVM upgrade could cause these two operations to be ordered differently

The correct abstraction to prevent both compiler-level and architecture-level reordering are atomic fences, either use atomic::fence or use Atomic*.{load,store,etc} -- the latter contain implicit atomic fences. How the different Orderings prevent reordering is covered here.

  1. volatile operations prevent compiler reordering but the memory operations can be reordered at the architecture level: the processor may execute instructions out of order or the memory bus may commit writes in an order that doesn't match the source code; cache implementations may also be relevant here.

Again, the correct way to prevent re-ordering of memory operations are atomic fences. Using atomic fences will make this crate portable to e.g. (single core) ARM Cortex-A and more complex architectures.

cc @yatekii

Yatekii commented 4 years ago

Thanks a lot for reporting this too!

I didn't even know that the core will possibly reorder the memory operations! Thanks for telling me.

mvirkkunen commented 4 years ago

I suppose an acceptable course of action would be to get rid of the volatiles and then just pepper fence(SeqCst) in some critical spots (e.g. writing the header, and also when dealing with the buffer data) to enforce correct ordering between the core and debugger (which is a somewhat weird situation to be dealing with anyways). I doubt it would have any sort of performance impact in practice either.

Edit:

On second thought, fences don't prevent the compiler from completely removing operations though if it looks like the results are not being used (which they do), do they? So all writes still have to be volatile.

I don't suppose there's a volatile "memcpy" anywhere? I really would rather not implement an efficient memcpy by hand.

mvirkkunen commented 4 years ago

Does this commit look any better? I used atomics for everything that changes on the fly, and volatile writes for all the initialization stuff.