mi3nts / mints-aq-reports

Repository for generation of MINTS automated reports
https://mi3nts.github.io/mints-aq-reports/
1 stars 2 forks source link

Create notebook execution pipeline using snakemake or another DAG-type tool #3

Closed john-waczak closed 3 months ago

john-waczak commented 1 year ago

Currently we can use papermill with the conda environment in the /notebooks folder to execute parametrized notebooks and place their rendered versions in the /website folder for quarto to then render upon each git push. Now we need a way to generate the yaml files with parameters for batches of notebooks for each node id. I suggest we do the following

  1. Grab the same csv file used for NodeRed that keeps the sensor id lookup table. We can just fetch the artifact from the AirQualityAnalysisWorkflows repo so it's always up-to-date rather than having 2 copies to maintain.
  2. Create a script to automatically generate a folder with parameter yaml files with the desired date range for each unique node id in the list. Add this folder to .gitignore so we don't needlessly track these files.
  3. Create a script to generate slurm jobs for each parameter file. Name the output notebook appropriately (using the node id). NOTE: we should also use the node id as a parameter to correctly set the title of the notebook. In julia we can accomplish this with an md string from Markdown.jl.
  4. Set up a cronjob to submit the jobs periodically
  5. Once per day, do a git commit / push cycle which should force the documentation to rebuild.