Currently, when clean runs, it confirms that all applications have been in a safe state before going through and deleting everything in the deletion queue. Our implementation of this is safe, however, it isn't very performant. By incurring a small extra space cost, we could avoid a whole class of possible performance defects. Here's the proposal:
Add a last_operation_started field to the thread_info_list struct. This will be atomically populated each time a thread starts an operation. (side note: we should document the list of unsafe/safe operations somewhere (currently, we set a safe variable in the operation to indicate that)). We'll keep the existing safe variable, this is only an addition.
Add a deletion_timestamp field to the element_queue struct. This will by populated at the end of the del function.
Modify the clean function to calculate the minimum last_operation_started time. Pop elements off the deletion queue (and on to the free list) when deletion time < min_last_operation_started. Skip elements for which that is not the case. At the end, if there are any elements remaining, check the last_operation_started timestamps again and, if the conditional still does not hold, spin on safe.
Notes:
Could we do this by killing safe and instead just using last_op_started and last_op_ended? Then checking something about the ordering?
Also, how can we ensure that the timestamps are reliable and not too expensive to calculate?
Currently, when clean runs, it confirms that all applications have been in a safe state before going through and deleting everything in the deletion queue. Our implementation of this is safe, however, it isn't very performant. By incurring a small extra space cost, we could avoid a whole class of possible performance defects. Here's the proposal:
last_operation_started
field to thethread_info_list
struct. This will be atomically populated each time a thread starts an operation. (side note: we should document the list of unsafe/safe operations somewhere (currently, we set asafe
variable in the operation to indicate that)). We'll keep the existingsafe
variable, this is only an addition.deletion_timestamp
field to the element_queue struct. This will by populated at the end of thedel
function.clean
function to calculate the minimumlast_operation_started
time. Pop elements off the deletion queue (and on to the free list) whendeletion time < min_last_operation_started
. Skip elements for which that is not the case. At the end, if there are any elements remaining, check thelast_operation_started
timestamps again and, if the conditional still does not hold, spin onsafe
.