trolando / sylvan

Implementation of multi-core (binary) decision diagrams
Apache License 2.0
65 stars 28 forks source link

Concurrency Correctness (Cache) #42

Open nhusung opened 1 year ago

nhusung commented 1 year ago

Shouldn‘t it be memory_order_acquire here:

https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L115 https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L176

I‘m not an expert about the C/C++ memory model, but as far as I understand it, the compiler or CPU is free to reorder the instructions roughly like this (i.e. move the load instructions of the bucket values before the load of the status):

    uint64_t bucket_a = bucket->a, bucket_b = bucket->b, bucket_c = bucket->c;
    const uint32_t s = atomic_load_explicit(s_bucket, memory_order_relaxed);
    // abort if locked or if part of a 2-part cache entry
    if (s & 0xc0000000) return 0;
    // abort if different hash
    if ((s ^ (hash>>32)) & 0x3fff0000) return 0;
    // abort if key different
    if (bucket_a != a || bucket_b != b || bucket_c != c) return 0;
    *res = bucket->res;
    // abort if status field changed after compiler_barrier()
    return atomic_load_explicit(s_bucket, memory_order_acquire) == s ? 1 : 0;

memory_order_acquire guarantees that “no reads or writes in the current thread can be reordered before this load,” which would fix the problem. In return one could probably change the memory_order_acquire of the second atomic load into memory_order_relaxed, at least I don’t see any need for memory_order_acquire here.

Also, atomic_compare_exchange_weak in the put functions could probably be replaced by atomic_compare_exchange_weak_explicit with memory_order_acq_rel for the success case and memory_order_relaxed for the fail case. This may have no effect on the generated assembly for strongly-ordered architectures such as x86-64 but may have a performance impact on others. Furthermore, it would be great if the compiler_barrier() macro and comments could be removed, they are somewhat confusing when reading the code for the first time.

nhusung commented 1 year ago

In addition, the loads and stores here (and similarly for cache_get6() and cache_put6()) need to be atomic operations:

https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L182-L183 https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L209-L212

Section 5.1.2.4 of the C23 language specification draft states that if “there is ambiguity about which side effect to a non-atomic object is visible, then there is a data race and the behavior is undefined.” Assume that thread T1 executes cache_get() up to the point immediately before the loads cited above (L182). Then thread T2 executes cache_put() for the same bucket up to the point immediately before the stores cited above (L209). Now, there is no proper happens-before relationship between the cited loads and stores. Therefore, the values visible to T1 are ambiguous and we have undefined behavior. As I read the language specification, it does not matter that this ambiguity is finally “resolved” by checking the status on line 185 in cache_get().