usgs / groundmotion-processing

Parsing and processing ground motion data
Other
54 stars 41 forks source link

Database for workflow status #1056

Open emthompson-usgs opened 1 year ago

emthompson-usgs commented 1 year ago

I'm thinking that a simple sqlite database to track the command history and status would be useful. I think that having a table for each of these subcommands:

And each subcommand table would have the following columns:

I think this should help keep track of when some events have had problems in projects with lots of events.

We could also add a subcommand to summarize the command status. One idea would be a table with rows for eqid and columns for each subcommand with cell values for the most recent end_time (empty if last run is not successful).

mhearne-usgs commented 1 year ago

You probably only really need one table, called status (or something). It would look just like your subcommand table except with the first column of "command". If this was going to be a large database (millions of records) then you would want to split out command into its own table, and put in a foreign key for the relevant command into your status table. I don't think this will have that many records.

baagaard-usgs commented 1 year ago

I strongly recommend that we consider using a workflow management tool for this feature rather than implementing something ourselves. This could either be an optional feature or something done by the user outside of gmprocess.

Apache Airflow seems like a good compromise between features and number+complexity of dependencies. It is pure Python and pip installable. It not only keeps track of the state of tasks, but it allows a user to visualize the workflow (tasks dependencies), monitor progress, and rerun failed pieces. From a user perspective, I would like to be able to construct the full workflow for compiling, processing, and analyzing the ground-motion records, which includes steps outside gmprocess.