Commit b7b90c80614a0878526543ac35ee03653064f0f0 was addresing a race condition between the Garbage Collector and the SR detach operation where the GC could get started while the SR detach was proceeding due to there not being any mutual exclusion between these operations. To address this the code was changed to obtain the gc_active lock only when within the SR lock, which is also held by the detach operation. This prevents the starting GC process from acquiring the active lock until the detach has completed, at which point the SR would be detached and the GC would exit.
There was a problem with this commit in that it used a Lock acquireNoblock to acquire the SR lock and if it failed to do so assumed that this meant the GC was already running. As the GC is typically kicked as the result of a VDI delete (or manually as an SR scan) the SR lock would be held. This results in the GC lock acquisition being racy and dependent on the time taken for the process to fork and daemonise. The outcome of this is that under some conditions the GC process will never start and cleanup will not occur, leading to an inability to take new snapshots when the maximum chain length is exceeded. It should instead be using a blocking acquire which will wait until the current holder exits. This commit applies this change.
Commit b7b90c80614a0878526543ac35ee03653064f0f0 was addresing a race condition between the Garbage Collector and the SR detach operation where the GC could get started while the SR detach was proceeding due to there not being any mutual exclusion between these operations. To address this the code was changed to obtain the gc_active lock only when within the SR lock, which is also held by the detach operation. This prevents the starting GC process from acquiring the active lock until the detach has completed, at which point the SR would be detached and the GC would exit.
There was a problem with this commit in that it used a Lock acquireNoblock to acquire the SR lock and if it failed to do so assumed that this meant the GC was already running. As the GC is typically kicked as the result of a VDI delete (or manually as an SR scan) the SR lock would be held. This results in the GC lock acquisition being racy and dependent on the time taken for the process to fork and daemonise. The outcome of this is that under some conditions the GC process will never start and cleanup will not occur, leading to an inability to take new snapshots when the maximum chain length is exceeded. It should instead be using a blocking acquire which will wait until the current holder exits. This commit applies this change.