Open nhusung opened 1 year ago
In addition, the loads and stores here (and similarly for cache_get6()
and cache_put6()
) need to be atomic operations:
https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L182-L183 https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L209-L212
Section 5.1.2.4 of the C23 language specification draft states that if “there is ambiguity about which side effect to a non-atomic object is visible, then there is a data race and the behavior is undefined.” Assume that thread T1 executes cache_get()
up to the point immediately before the loads cited above (L182). Then thread T2 executes cache_put()
for the same bucket up to the point immediately before the stores cited above (L209). Now, there is no proper happens-before relationship between the cited loads and stores. Therefore, the values visible to T1 are ambiguous and we have undefined behavior. As I read the language specification, it does not matter that this ambiguity is finally “resolved” by checking the status on line 185 in cache_get()
.
Shouldn‘t it be
memory_order_acquire
here:https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L115 https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L176
I‘m not an expert about the C/C++ memory model, but as far as I understand it, the compiler or CPU is free to reorder the instructions roughly like this (i.e. move the load instructions of the bucket values before the load of the status):
memory_order_acquire
guarantees that “no reads or writes in the current thread can be reordered before this load,” which would fix the problem. In return one could probably change thememory_order_acquire
of the second atomic load intomemory_order_relaxed
, at least I don’t see any need formemory_order_acquire
here.Also,
atomic_compare_exchange_weak
in theput
functions could probably be replaced byatomic_compare_exchange_weak_explicit
withmemory_order_acq_rel
for the success case andmemory_order_relaxed
for the fail case. This may have no effect on the generated assembly for strongly-ordered architectures such as x86-64 but may have a performance impact on others. Furthermore, it would be great if thecompiler_barrier()
macro and comments could be removed, they are somewhat confusing when reading the code for the first time.