spacepy / dbprocessing

Automated processing controller for heliophysics data
5 stars 4 forks source link

_getRequiredProducts should require input files to exist on disc #47

Closed jtniehof closed 2 years ago

jtniehof commented 3 years ago

dbprocessing _getRequiredProducts, which figures possible input files for actually running a process, calls getFilesByProductTime or getFilesByProductDate, requiring newest_version. But it doesn't specifically require exists, which getFiles does support. It probably makes sense when trying to build something to require the input file to actually exist. There's also a good question what to do when the latest version of a file doesn't exist (getFiles gets the latest version of the returned results, not necessarily guaranteed lastest in the database.)

Relation to an issue

45 is another example of what happens from getFiles working on the latest returned version rather than latest in database. In my particular use case, skipping nonexistent files would probably protect me from the effects of #45.

Proposed enhancement

Update _getRequiredProducts to call getFiles directly with the exists kwarg specified. Also, potentially, update getFiles so that if the latest version in the db doesn't exist on disc, and both newest_version and exists are True, nothing will be returned. (This might happen semi-naturally with #45.)

Alternatives

45 is probably more important and if it's fixed, this becomes less of an issue. Right now if the latest version doesn't exist on disk it will probably fail when runMe does the input file existence checks.

OS, Python version, and dependency version information:

Linux-4.15.0-122-generic-x86_64-with-Ubuntu-18.04-bionic
sys.version_info(major=2, minor=7, micro=17, releaselevel='final', serial=0)
sqlalchemy=1.1.11

Version of dbprocessing

Current git master (8e5d3ae432fb35227c74bc5615422cf456b39578)

Closure condition

This issue should be closed when PR is merged with a chosen solution and robust testing.