This PR handles a few potential race conditions in the way CHIME moves data around its Storage graph. There are two parts.

Change to deletion

The easier part is a change to the way alpenhorn collects files for deletion. It now skips deleting a file copy which is needed to fulfill a copy request. It doesn't cancel the deletion, so it will consider deleting it again next time, but deletion won't actually occur until the blocking copy request is handled (completed or cancelled), avoiding a potential race condition in file management.

Edge table

The bigger change here is the addition of a table providing metadata for the edges in the Storage directed graph (edges in a directed graph are usually called "arrows"). This is the table StorageTransfer defined in storage.py.

We've talked about edge tables for a long time, and the PR implements almost nothing of what we've talked about, though there is potential to add more stuff later.

A StorageTransfer edge is defined by a node_from (source) and a group_to (destination). Any ArchiveFileCopyRequest with the same node_from and group_to is transferring data along that edge, and there's the potential to use the StorageTransfer to provide finer-grained configuration for specific routes. But, as I alluded to above, none of that is implemented here.

What is implemented is two post-file-adding actions which I've previously mentioned in #158 :

Autosync: when a file is added to a node, autosync tells alpenhorn to immediately queue a new ArchiveFileCopyRequest to transfer it from this node to a destination group. This allows us to define automatic transfer routes. For example, we can turn on autosync on the cedar_staging->scinet_staging to automatically copy files to niagara when they arrive on cedar. This feature is needed to avoid a race condition distributing files to cedar_staging and cedar_offload after transfer from gong.
Autoclean: when a file is added to a node, autoclean tells alpenhorn to immediately mark the file copy on an "upstream" node for deletion (by setting wants_file to "N"). This feature isn't really needed to fix any problems, but, along with autosync it allows us to completely specify in the alpenhorn database our routine data movement using the following edge definitions (obviously, some of these won't work until we deploy alpenhorn2 to the other hosts):

node_from	group_to	auto-sync	auto-clean	explanation
gong	cedar_staging	Y	N	files appearing on gong are automatically transferred to cedar_staging
cedar_staging	cedar_offload	Y	N	files arriving on cedar_staging are automatically transferred to cedar_offload
cedar_staging	scinet_staging	Y	N	files arriving on cedar_staging are automatically transferred to scinet_staging
cedar_staging	scinet_hpss	N	Y	files are deleted from cedar_staging after being archived in HPSS on scinet
cedar_offload	cedar_nearline	Y	Y	files arriving on cedar_offload are automatically transferred to nearline, and then deleted once they're in nearline
scinet_staging	scinet_hpss	Y	Y	files arriving on scinet_staging are automatically transferred to HPSS, and then deleted once they're in HPSS

The post-add actions (autosync and autoclean) are triggered whenever a file appears on a node (i.e. both via import and also pull requests).

Subtlety! The route a file takes does not matter when performing autosync and autoclean. So, e.g., in the example table above, autocleaning of cedar_staging will happen after files appear on HPSS, even though the cedar_staging->scinet_hpss edge is not used to transfer files into HPSS.

alpenhorn ignores all StorageTransfer records where node_from.group == group_to (i.e. edges pointing back to their origin, what are known as "self-loops" or 1-cycles).

Also: autoclean and autosync aren't complete replacments for cron-based alpenhorn clean/sync invocations. These actions only ever trigger once, when a file first appears on a node. It can't do a full sync/clean of a node.

Closes #158

radiocosmology / alpenhorn

feat: Edge table and transfer race condition fixes #170

Change to deletion

Edge table