This PR completes the re-write of update.py to transition to the new I/O framework by re-implementing support for pull requests.
The UpDownLock
This is a fairly simple thread synchronization primitive that I'm a little surprised it doesn't already exist.
It's used to prevent race conditions during the modifications of a node directory tree by only allowing either creation or deletion to happen at any given time. It's needed to prevent the following (starting from a node with only one file on it, /node/acq/file1:
worker1 (pulling) worker2 (deleting)
----------------------- -----------------------
wants to create: wants to delete:
/node/acq/file2 /node/acq/file1
----------------------- -----------------------
1. verifies that /node/acq exists deletes /node/acq/file1
2. skips directory creation deletes /node/acq (empty dir)
3. tries to create /node/acq/file2 ...
4. catches on fire because /node/acq ...
is now missing after verifying
it existed
Any number of threads may hold the lock in a particular state ("lock up" or "lock down") at any given time, but switching from one state to the other forces threads to wait for the lock to become unlocked first.
Group I/O
Finally, the reason for having an I/O class for StorageGroups: pull requests, because they target a group are now handled by the group I/O class. The job of the group I/O class is to:
figure out if the file already exists in this group (DefaultGroupIO.exists) and, if it does, cancel the request
otherwise, figure out which node in the group (if any) will receive the pull request (DefaultGroupIO.pull)
If the group selects a node for the request, the rest of the pull is then handed off to the node's I/O layer.
Node I/O
The guts of the pull request haven't changed and live in the pull_async, which is just a cleaned-up version of what used to be in update_node_requests.
One wrinkle:
because all the pull requests now (potentially) happen simultaneously, there needs to be a mechanism for avoiding pulling too much data to the node at once. This is done via the reserve_bytes mechanism in DefaultIO which synchronously (i.e. in the main thread) tries to "reserve" a certain amount of the free space on the node before creating a pull task. If the reservation fails, the request is skipped (node too full), until the next time through the loop when there might be space.
There's a "reservation factor" which I've set to 2 (though the correct factor might be something else), meaning twice as much space is reserved than is needed. This guarantees we'll never overfill a node as a result of actual-apparent size differences, but does mean a node will never get to 100% full. Not sure if there's a better way to do this.
Remote I/O
This PR also introduces a somewhat-awkward "Remote I/O" module which is used by alpenhorn to learn things about a non-local node (Technically, it's any node that's the source-side of a pull request, which may in fact be local). The reason for this is knowledge about a remote node needs to be laundered through the I/O layer because the I/O class and I/O config may affect the information.
There's only two bits of information that alpenhorn needs from a remote node:
what is the path to the remote file (file_addr or file_path) that can be given to bbcp/rsync
is the remote file ready to be pulled (pull_ready)
This information could definitely just be part of the Node I/O class, but making it a separate object prevents trying to perform I/O operations on the source node of a pull request.
Changes to pulls
I've made a few, hopefully minor, changes to the details of how a pull request happens:
hard links are now never made between archive nodes and non-archive nodes. I think this is a good idea for the purposes of data integrity of the archived data. It prevents the accidental corruption of archived files which are accessed via a different node (where you might otherwise think corruption isn't a big deal because you can always just re-sync from the archive copy).
I'm more explicit about what to do when there's a copy of a file on the pull destination (which may or may not be registered in the database). Alpenhorn-1 would just fall over and complain when this happened. Alpenhorn-2 before this rewrite was better about things: it just explicitly clobbered a destination file. But now the response is more nuanced:
If there's an ArchiveFileCopy on the destination node with has_file=='Y', then alpenhorn says "Job's a good'un" and cancels the request it was working on
If there's an ArchiveFileCopy with has_file=='X', then the pull goes ahead and the corrupt file is overwritten.
If there's an ArchiveFileCopy with has_file=='M', then alpenhorn figures the destination file is going to be checked for corruption soon, and just abandons the request without resolving it (reasoning that next time round, when we handle the request again, the check will have completed and changed has_file to "Y" or "X" as appropriate.)
Otherwise (ArchiveFileCopy.has_file=='N' or no copy record at all), then it looks for a file with the destination path. If it finds one, it creates/updates an ArchiveFileCopy for the existing file and sets has_file='M' so that the unexpected file on the destination gets checked. The request is in this case abandonned without resolution and, like the other has_file=='M' case above, it will come back to finish the request after the check is completed.
Ultimately, if none of the above happens, then the pull is clear to proceed.
Broken by this PR
This PR deletes all the special casing previously in update.py for transport nodes. A replacement, in the form of a Transport Group I/O class, will be implemented to fix this in the next PR.
Resolved by this PR
I think this closes #48 in that a NULL address is only going to be a problem in cases when a pull request actually needs it now.
This PR completes the re-write of
update.py
to transition to the new I/O framework by re-implementing support for pull requests.The UpDownLock
This is a fairly simple thread synchronization primitive that I'm a little surprised it doesn't already exist.
It's used to prevent race conditions during the modifications of a node directory tree by only allowing either creation or deletion to happen at any given time. It's needed to prevent the following (starting from a node with only one file on it,
/node/acq/file1
:Any number of threads may hold the lock in a particular state ("lock up" or "lock down") at any given time, but switching from one state to the other forces threads to wait for the lock to become unlocked first.
Group I/O
Finally, the reason for having an I/O class for StorageGroups: pull requests, because they target a group are now handled by the group I/O class. The job of the group I/O class is to:
DefaultGroupIO.exists
) and, if it does, cancel the requestDefaultGroupIO.pull
)If the group selects a node for the request, the rest of the pull is then handed off to the node's I/O layer.
Node I/O
The guts of the pull request haven't changed and live in the
pull_async
, which is just a cleaned-up version of what used to be inupdate_node_requests
.One wrinkle:
reserve_bytes
mechanism in DefaultIO which synchronously (i.e. in the main thread) tries to "reserve" a certain amount of the free space on the node before creating a pull task. If the reservation fails, the request is skipped (node too full), until the next time through the loop when there might be space.Remote I/O
This PR also introduces a somewhat-awkward "Remote I/O" module which is used by alpenhorn to learn things about a non-local node (Technically, it's any node that's the source-side of a pull request, which may in fact be local). The reason for this is knowledge about a remote node needs to be laundered through the I/O layer because the I/O class and I/O config may affect the information.
There's only two bits of information that alpenhorn needs from a remote node:
file_addr
orfile_path
) that can be given to bbcp/rsyncpull_ready
)This information could definitely just be part of the Node I/O class, but making it a separate object prevents trying to perform I/O operations on the source node of a pull request.
Changes to pulls
I've made a few, hopefully minor, changes to the details of how a pull request happens:
ArchiveFileCopy
on the destination node withhas_file=='Y'
, then alpenhorn says "Job's a good'un" and cancels the request it was working onArchiveFileCopy
withhas_file=='X'
, then the pull goes ahead and the corrupt file is overwritten.ArchiveFileCopy
withhas_file=='M'
, then alpenhorn figures the destination file is going to be checked for corruption soon, and just abandons the request without resolving it (reasoning that next time round, when we handle the request again, the check will have completed and changedhas_file
to "Y" or "X" as appropriate.)ArchiveFileCopy.has_file=='N'
or no copy record at all), then it looks for a file with the destination path. If it finds one, it creates/updates anArchiveFileCopy
for the existing file and setshas_file='M'
so that the unexpected file on the destination gets checked. The request is in this case abandonned without resolution and, like the otherhas_file=='M'
case above, it will come back to finish the request after the check is completed.Broken by this PR
This PR deletes all the special casing previously in
update.py
for transport nodes. A replacement, in the form of a Transport Group I/O class, will be implemented to fix this in the next PR.Resolved by this PR
I think this closes #48 in that a NULL
address
is only going to be a problem in cases when a pull request actually needs it now.