Closed ketiltrout closed 1 year ago
This is a major rewrite of this PR, which I think makes things better by making alpenhorn simpler, though paring away weird CHIME-isms which don't contribute to alpenhorn's goals.
Instead of the complex Info-table framework and the model-extension stuff I had written, now I've just dropped Info
tables completely from alpenhorn because, like ArchiveInst
, alpenhorn doesn't care about them and shouldn't be the one dealing with them.
And then, by dropping info tables, we can also drop AcqType
and FileType
, because all they were doing for alpenhorn was telling it which info classes to interrogate for which imported files, so without info tables, they have nothing to do. It's better to let the third parties, decide how to organize their data rather than arbitrarily forcing something on them for which we have no need.
(For CHIME, all of these dropped things are moved/reimplemented in alpenhorn-chime
).
Overall I think it's cleaner, more intuitive, and more flexible without losing anything CHIME needs.
This PR is concerned with implementing the "file import detection" framework for the daemon. I think this is the last of the "structural" PRs. Subsequent PRs in this rewrite will deal primarily with changes to I/O code.
Motivation
The ultimate goal of this PR is to produce the infrastructure needed by the CHIME alpenhorn extensions over in alpenhorn-chime.
This PR does the following:
AcqType
,FileType
and all the Info class framework (all moved toalpenhorn-chime
).alpenhorn-chime
)alpenhorn/generic.py
toexamples/pattern_importer.py
, and updates it to work with this new systemRemoval of the Info framework
I've removed all reference to info classes from alpenhorn. They were an integral part of alpenhorn-1, but in alpenhorn-2 they served two purposes:
detect
function which was passed an acq or file name and returned True or False to indicate a file which needed to be imported by alpenhornnew
class method to generate a new record in these tables.The first of these features has been replaced by a new "import-detect" extension type which provides a simple function which will perform the detection step of the import. See the "The 'import-detect' Extension" section below.
The second of these functions is replaced with an optional post-import hook, which removes the awkwardness of requiring alpenhorn to add rows to tables it knows nothing about. See the "The post-import Callback" section below.
Removal of
AcqType
andFileType
While CHIME makes heavy use of
AcqType
andFileType
to manage our data, in alpenhorn their use was solely to determine which Info tables were available to perform import detection. With the removal of info classes, they no longer have a use in alpenhorn. I've moved them toalpenhorn-chime
where they've been re-implemented (like theArchiveInst
table was).Removal of these two tables also means the
acq_types
andfile_types
extensions are no longer needed, and they have been removed fromextensions.py
, as well as theregister_type_extensions
call that was being made inservice.py
.The "import-detect" Extension
In place of all the above is a new extension type called "import-detect". Each "import-detect" extension returns (via
register_extension
) a single callable object, which is the "detect" function used during file import.(This is exactly what
alpenhorn-chime
is: an alpenhhorn import-detect extension.)The detect function is passed a
pathlib.Path
pointing to the file to import and theUpdateableNode
containing the candidate data file. The function must determine if the path points to a file that alpenhorn should import. It must return a 2-tuple:None
to indicate detection failed (i.e. the path does not point to a data file needing import), or else it's the acquisition name, which must be a portion of the path passed in (with the file name becoming the remaining portion of the path).ArchiveAcq
ArchiveFile
,ArchiveFileCopy
records have been generated for the file. If no post-import callback is needed, this may beNone
. (In implementing this new system, I've discoveredfunctools.partial
to be a useful thing to retrun to alpenhorn as a callback because it allows the detect function to pass data to a callback.) See the following section for more detailsMultiple "import-detect" extensions may be loaded. In that case, the import code tries each in extension order until one of them reports a successful match.
alpenhorn will run without any "import-detect" extensions loaded, but will be unable to import files in that case. (Attempts to import files will result in an error message).
The Post-Import Callback
When provided, alpenhorn will pass to the callback the following parameters:
ArchiveFileCopy
of the newly imported fileArchiveFile
for the file (orNone
if it already existed)ArchiveAcq
for the file (orNone
if it already existed)UpdateableNode
on which this import took place.The
ArchiveFile
andArchiveAcq
of any imported file may be obtained via theArchiveFileCopy
; newly-created instances are passed to the callback so the callback knows when they are new or not. These two parameters could be replaced by booleans without loss of information, but I think it's more direct to do it like this.The value returned from the callback is ignored.
The "pattern-importer" example extension
The regex/glob-based example extension formerly found in
alpenhorn/generic.py
(which was an example of an "acq_types"/"file_types" extension) has been moved toexamples/pattern-importer.py
and updated to be an "import-detect".It's not used yet but this example extension will also eventually be used in the end-to-end test in
tests/test_service.py
.Changes to
auto_import
The changes here are somewhat performative: they are the code changes needed to use the new info class system, but the
auto_import
code doesn't really work yet within the new framework. A subsequent PR in this series will transition the import code to use the new task queue. As part of that, this code will get fixed. Despite that, it's good to make this change here to show how the changes to info classes affect the calling code.