spacepy / dbprocessing

Automated processing controller for heliophysics data
5 stars 4 forks source link

Evaluate clean/sort of process queue before processing #111

Open jtniehof opened 2 years ago

jtniehof commented 2 years ago

We have in notes and old code that ProcessQueue.py -p should first call ProcessqueueClean before looping over the process queue entries and calling buildChildren. ProcessQueueClean is supposed to remove duplicate entries in the process queue and also make sure it's sorted by date and level to minimize overlap.

That call was commented out by @balarsen in 2017 with the comment that buildChildren will clean the queue. I'm not seeing this...the only cleaning it does is skipping files which aren't the newest version.

This isn't going to break anything, but might slow down the calculation a bit. It's also a bit confusing, since we've got some contradictory notes.

OS, Python version, and dependency version information:

Linux-4.15.0-163-generic-x86_64-with-Ubuntu-18.04-bionic
sys.version_info(major=2, minor=7, micro=17, releaselevel='final', serial=0)
sqlalchemy=1.1.11

Version of dbprocessing

Current master

Closure condition

There's not one obvious path forward here. Should probably restore the clean, or update buildChildren to do it, or decide we don't care, and double-check the docs once that's done. Also should remove the commented code once this is resolved.