Open guinn8 opened 2 years ago
Hello @guinn8 indeed PackedArray
is not thread safe and you need to protect access from the outside.
I attempted to use an array of mutexes to guard segments of a PackedArray While I find that overkill at first sight, if you really want to go that route you need
- to allocate an array of mutexes of length given by
PackedArray_bufferSize()
- replicate the logic in
PackedArray_set
to figure out which mutexes to lock/unlock- you will have to lock
1
or2
mutexes depending on whether the logical element spans2
buffer cells or not- or you can pessimistically always lock
2
mutexes (but you don't need aif
/else
which can be mispredicted)
Thanks @gpakosz! I got the array of mutexes working and it has massively increased performance.
I did this by allocating an arbitrarily (!= buffer_size
) sized array of mutexes in the create
function.
// this will result some locks hanging off the end of the array, not really a big
// deal that it doesn't perfectly match up as all elements are covered
a->lock_info.lock_interval = ceil((float)bufferSize / (float)num_locks);
a->lock_info.locks = calloc(num_locks, sizeof(omp_lock_t));
for (size_t i = 0; i < num_locks; i++) {
omp_init_lock(&a->lock_info.locks[i]);
}
And then defined some functions to lock the mutex corresponding to a index in the underlying buffer (not the PackedArray index).
void PackedArray_lock_offset(PackedArray* a, const uint64_t offset) {
size_t bufind = ((uint64_t)offset * (uint64_t)a->bitsPerItem) / 32;
size_t lockind = bufind / a->lock_info.lock_interval; // using implicit floor
omp_set_lock(&(a->lock_info.locks[lockind]));
}
void PackedArray_unlock_offset(PackedArray* a, const uint64_t offset) {
size_t bufind = ((uint64_t)offset * (uint64_t)a->bitsPerItem) / 32;
size_t lockind = bufind / a->lock_info.lock_interval; // using implicit floor
omp_unset_lock(&(a->lock_info.locks[lockind]));
}
To make my life easier I decided to only support 32 divisible by bitsPerItem
.
if (bitsPerItem <= bitsAvailable)
{
out[0] = (out[0] & ~(mask << startBit)) | (in << startBit);
}
else
{
PACKEDARRAY_ASSERT(0); // not supporting num_bits that doesnt evenly divide 32
...
}
There is not reason this couldn't work for other bit sizes, I just don't need them so I didn't implement.
I'm happy to push this code somewhere if there is interest :)
Posting for the sake of others information. I attempted to use an array of mutexes to guard segments of a PackedArray to reduce the delay caused by threads waiting single-file to write to the PackedArray. This approach works fine with normal arrays. However with the PackedArray I was getting inconstant results.
This suggests that any invocation of the
PackedArray_get
/PackedArray_set
should be guarded by a mutex, even if the individual index is guarded.