The SQL update queries have been sped up to the point where they can reliably be executed through a python DB adapter (psycopg2) without timeouts or connections lost. I believe this now allows us to tie together and automate the full process, with improved, clearer logging to detect issues.
[ ] Record before and after load table counts. Compare, write out to log, and (in certain tables) stop process if no change (this likely indicates data wasn't updated). For instance, if the staging table goanno_wf is the same length before and after update.
[ ] Initialize log file(s) automatically - a shorter, summary log detailing table length differences, input file dates/sizes, query run times + a more verbose log recording full queries, results, and python output useful for debugging.
[ ] Tie the steps together through Makefile dependencies - the individual steps (recipes) are currently executed manually with no dependency chain. The full update should be runnable with one command.
To help better organize the code, maybe use something like SQLAlchemy that can map tables to objects?
The SQL update queries have been sped up to the point where they can reliably be executed through a python DB adapter (psycopg2) without timeouts or connections lost. I believe this now allows us to tie together and automate the full process, with improved, clearer logging to detect issues.
goanno_wf
is the same length before and after update.To help better organize the code, maybe use something like SQLAlchemy that can map tables to objects?