Closed anjackson closed 1 year ago
Note in particular that having launch-this-crawl tasks and update-this-access-service tasks tied together is not a great idea. e.g. on DEV when we want to try one without running the other.
I've separated out the crawl launcher, and I think that's good enough for now. Note that using things like External Task etc. don't really help because it means e.g. the launcher can't run because the other one didn't. Instead, it makes sense for them to be separate and for the launch source files to be atomically updated.
Currently, one file contains three workflows, because they share code for dumping the W3ACT DB, and each runs their own dump in case of conflicts due to workflows running simultaniously. To me a bit more canonical-Airflow in style and a bit easier to manage, the workflows could be changed as follows:
w3act_export
/var/tmp/w3act_export_2021-12-10T09:00:00Z/
so that each run get's it's own output folder.w3act_backup
andw3act_report
ExternalTaskSensor
to await the completion of thew3act_export
workflow for the hour at which they run, e.g.2021-12-10T00:00:00Z
. They would then refer to the corresponding W3ACT DB dump and use that instead of a separate dump.This would make it easier to keep them in separate files, which is also more canonical for Airflow, and makes things a bit easier to understand.