Thread safety - Githubissues

guinn8 commented 2 years ago

Posting for the sake of others information. I attempted to use an array of mutexes to guard segments of a PackedArray to reduce the delay caused by threads waiting single-file to write to the PackedArray. This approach works fine with normal arrays. However with the PackedArray I was getting inconstant results.

This suggests that any invocation of the PackedArray_get / PackedArray_set should be guarded by a mutex, even if the individual index is guarded.

gpakosz commented 2 years ago

Hello @guinn8 indeed PackedArray is not thread safe and you need to protect access from the outside.

I attempted to use an array of mutexes to guard segments of a PackedArray While I find that overkill at first sight, if you really want to go that route you need

to allocate an array of mutexes of length given by PackedArray_bufferSize()

replicate the logic in PackedArray_set to figure out which mutexes to lock/unlock

you will have to lock 1 or 2 mutexes depending on whether the logical element spans 2 buffer cells or not

or you can pessimistically always lock 2 mutexes (but you don't need a if / else which can be mispredicted)

guinn8 commented 2 years ago

Thanks @gpakosz! I got the array of mutexes working and it has massively increased performance.

I did this by allocating an arbitrarily (!= buffer_size) sized array of mutexes in the create function.

  // this will result some locks hanging off the end of the array, not really a big
  // deal that it doesn't perfectly match up as all elements are covered
  a->lock_info.lock_interval = ceil((float)bufferSize / (float)num_locks);
  a->lock_info.locks = calloc(num_locks, sizeof(omp_lock_t));
  for (size_t i = 0; i < num_locks; i++) {
    omp_init_lock(&a->lock_info.locks[i]);
  }

And then defined some functions to lock the mutex corresponding to a index in the underlying buffer (not the PackedArray index).

void PackedArray_lock_offset(PackedArray* a, const uint64_t offset) {
  size_t bufind = ((uint64_t)offset * (uint64_t)a->bitsPerItem) / 32;
  size_t lockind = bufind / a->lock_info.lock_interval;  // using implicit floor
  omp_set_lock(&(a->lock_info.locks[lockind]));
}

void PackedArray_unlock_offset(PackedArray* a, const uint64_t offset) {
  size_t bufind = ((uint64_t)offset * (uint64_t)a->bitsPerItem) / 32;
  size_t lockind = bufind / a->lock_info.lock_interval;  // using implicit floor
  omp_unset_lock(&(a->lock_info.locks[lockind]));
}

To make my life easier I decided to only support 32 divisible by bitsPerItem.

if (bitsPerItem <= bitsAvailable)
  {
    out[0] = (out[0] & ~(mask << startBit)) | (in << startBit);
  }
  else
  {
    PACKEDARRAY_ASSERT(0); // not supporting num_bits that doesnt evenly divide 32
   ...
  }

There is not reason this couldn't work for other bit sizes, I just don't need them so I didn't implement.

I'm happy to push this code somewhere if there is interest :)

gpakosz / PackedArray

Thread safety #8