spacepy / dbprocessing

Automated processing controller for heliophysics data
5 stars 4 forks source link

Guarantee ordering of input files #69

Open jtniehof opened 3 years ago

jtniehof commented 3 years ago

Right now dbp makes no guarantees of ordering of input files. In general I've found that, because of differences in hashing algorithm, Python 2 is more likely to preserve order/be sorted by default, and Python 3 is less likely. So this means code that expects the input files on the command line to be, say, always in ascending order of date, is more likely to be happy in Python 2 than in Python 3. In reality this has always been a coincidence, even on Python 2.

We could make some explicit guarantee of ordering (e.g. sort by product ID and, after that, by utc_file_date) and that might make life a bit easier for some code maintainers.

Relation to an issue

Somewhat related to #4, in that it's more likely to arise in that context.

Proposed enhancement

Always explicitly order the input command arguments to processing codes, and document (and test) this order.

Alternatives

Explicitly document this lack of guarantee and otherwise do nothing.

OS, Python version, and dependency version information:

Linux-4.4.0-98-generic-x86_64-with-Ubuntu-16.04-xenial
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
sqlalchemy=1.0.11

Version of dbprocessing

Current github master (80d12b59b29f9e5a0b6cc795fadc3c33daaab3cd)

Closure condition

This issue should be closed when one of the alternatives is chosen and PR merged with the implementation and/or documentation.