radiocosmology / alpenhorn

Alpenhorn is a service for managing an archive of scientific data.
MIT License
2 stars 1 forks source link

Rewrite 10/14: DefaultIO file pulls #153

Closed ketiltrout closed 1 year ago

ketiltrout commented 1 year ago

This PR completes the re-write of update.py to transition to the new I/O framework by re-implementing support for pull requests.

The UpDownLock

This is a fairly simple thread synchronization primitive that I'm a little surprised it doesn't already exist.

It's used to prevent race conditions during the modifications of a node directory tree by only allowing either creation or deletion to happen at any given time. It's needed to prevent the following (starting from a node with only one file on it, /node/acq/file1:

    worker1 (pulling)                       worker2 (deleting)
    -----------------------                 -----------------------
    wants to create:                        wants to delete:
       /node/acq/file2                        /node/acq/file1
    -----------------------                 -----------------------
1.  verifies that /node/acq exists          deletes /node/acq/file1

2.  skips directory creation                deletes /node/acq (empty dir)

3.  tries to create /node/acq/file2                  ...

4.  catches on fire because /node/acq                ...
      is now missing after verifying
      it existed

Any number of threads may hold the lock in a particular state ("lock up" or "lock down") at any given time, but switching from one state to the other forces threads to wait for the lock to become unlocked first.

Group I/O

Finally, the reason for having an I/O class for StorageGroups: pull requests, because they target a group are now handled by the group I/O class. The job of the group I/O class is to:

If the group selects a node for the request, the rest of the pull is then handed off to the node's I/O layer.

Node I/O

The guts of the pull request haven't changed and live in the pull_async, which is just a cleaned-up version of what used to be in update_node_requests.

One wrinkle:

Remote I/O

This PR also introduces a somewhat-awkward "Remote I/O" module which is used by alpenhorn to learn things about a non-local node (Technically, it's any node that's the source-side of a pull request, which may in fact be local). The reason for this is knowledge about a remote node needs to be laundered through the I/O layer because the I/O class and I/O config may affect the information.

There's only two bits of information that alpenhorn needs from a remote node:

This information could definitely just be part of the Node I/O class, but making it a separate object prevents trying to perform I/O operations on the source node of a pull request.

Changes to pulls

I've made a few, hopefully minor, changes to the details of how a pull request happens:

Broken by this PR

This PR deletes all the special casing previously in update.py for transport nodes. A replacement, in the form of a Transport Group I/O class, will be implemented to fix this in the next PR.

Resolved by this PR

I think this closes #48 in that a NULL address is only going to be a problem in cases when a pull request actually needs it now.