Open jtniehof opened 3 years ago
When the inspector is called after a file is made by dbp (#12 relates), the inspector gets a Diskfile object which includes basically everything in the file record. As long as this is populated before the inspector call, there's quite a bit in there.
This would be a good addition, not sure how to handle it as the inspector doesn't get info from the chain, just the file. Could make an inspector temp file for each file created that could be looked at... the abvoe comment on Diskfile is likly true if @jtniehof says it. I thought it was decoupled.
The inspector "reports back" by populating stuff in the Diskfile object, so just need to check on what order things are done in. It might be worth making sure the inspector can tell if it's a newly created file or ingesting one from scratch (verbose provenance might be enough information.)
It would be nice to have some support for an inspector updating the file, particularly when a file has been freshly-created by dbprocessing and if the inspector has access to all the verbose provenance and other things for the file. Since the inspector is file-format-aware, and is basically the point of interface between dbp and the file format, it's a great way to get the dbp information into the file instead of just the database.
Proposed enhancement
Explicit support (and documentation) for the inspector placing dbp-related information into a file on inspection. Need to make sure that e.g. checksum implications are handled.
Alternatives
What we're doing right now is having the actual processing codes populate metadata with the name of all the input files, so this is duplicate code that has to be in every processing code.
OS, Python version, and dependency version information:
Version of dbprocessing
Current master from github (734f37b1bfb3540f5682edd6dbb2e590eb51a3ff)
Closure condition
This issue should be closed when appropriate design is chosen, implemented, and merged.