Closed ketiltrout closed 11 months ago
In addition to implementing Richard's suggestions, I've changed autosync to fire whenever the destination has has_file!='Y' (instead of just when has_file=='N'/is missing). I think this makes more sense.
In the case of has_file=='M'
, the autogenerated pull request will not be acted upon on until the existing destination copy is checked. After checking, if the resultant copy is set to has_file=='Y'
alpenhorn will cancel the unnecessary pull request. On the other hand, if the file ends up being corrupt (has_file=='X'
), then the autosync will happen to overwrite the destination file.
This PR handles a few potential race conditions in the way CHIME moves data around its Storage graph. There are two parts.
Change to deletion
The easier part is a change to the way alpenhorn collects files for deletion. It now skips deleting a file copy which is needed to fulfill a copy request. It doesn't cancel the deletion, so it will consider deleting it again next time, but deletion won't actually occur until the blocking copy request is handled (completed or cancelled), avoiding a potential race condition in file management.
Edge table
The bigger change here is the addition of a table providing metadata for the edges in the Storage directed graph (edges in a directed graph are usually called "arrows"). This is the table
StorageTransfer
defined instorage.py
.We've talked about edge tables for a long time, and the PR implements almost nothing of what we've talked about, though there is potential to add more stuff later.
A
StorageTransfer
edge is defined by anode_from
(source) and agroup_to
(destination). AnyArchiveFileCopyRequest
with the samenode_from
andgroup_to
is transferring data along that edge, and there's the potential to use theStorageTransfer
to provide finer-grained configuration for specific routes. But, as I alluded to above, none of that is implemented here.What is implemented is two post-file-adding actions which I've previously mentioned in #158 :
ArchiveFileCopyRequest
to transfer it from this node to a destination group. This allows us to define automatic transfer routes. For example, we can turn on autosync on thecedar_staging->scinet_staging
to automatically copy files to niagara when they arrive on cedar. This feature is needed to avoid a race condition distributing files tocedar_staging
andcedar_offload
after transfer from gong.wants_file
to "N"). This feature isn't really needed to fix any problems, but, along with autosync it allows us to completely specify in the alpenhorn database our routine data movement using the following edge definitions (obviously, some of these won't work until we deploy alpenhorn2 to the other hosts):The post-add actions (autosync and autoclean) are triggered whenever a file appears on a node (i.e. both via import and also pull requests).
Subtlety! The route a file takes does not matter when performing autosync and autoclean. So, e.g., in the example table above, autocleaning of
cedar_staging
will happen after files appear on HPSS, even though thecedar_staging->scinet_hpss
edge is not used to transfer files into HPSS.alpenhorn ignores all
StorageTransfer
records wherenode_from.group == group_to
(i.e. edges pointing back to their origin, what are known as "self-loops" or 1-cycles).Also: autoclean and autosync aren't complete replacments for cron-based
alpenhorn clean/sync
invocations. These actions only ever trigger once, when a file first appears on a node. It can't do a full sync/clean of a node.Closes #158