ioos / ioos_metrics

Working on creating metrics for the IOOS by the numbers
https://ioos.github.io/ioos_metrics/
MIT License
2 stars 4 forks source link

Automate ioos by the numbers generation #2

Open MathewBiddle opened 2 years ago

MathewBiddle commented 2 years ago

Currently the IOOS by the numbers metrics are collected on a manual basis by running the Jupyter Notebook. The process should be automatically ran, using GH-Actions similar to the GTS metrics, on an annual? quarterly? basis.

MathewBiddle commented 2 years ago

Steps:

MathewBiddle commented 2 years ago

@ocefpaf do you have an example of running a jupyter notebook in a GitHub Action?

ocefpaf commented 2 years ago

@ocefpaf do you have an example of running a jupyter notebook in a GitHub Action?

There are many ways to do that but I prefer a single call to nbconvert. I usually save the notebook without any output and "convert" it to a notebook with the outputs.

MathewBiddle commented 2 years ago

I'd like to follow the single call to nbconvert to run the notebook and be done. Remove the standalone python script. Here's the addition to run the notebook.

      - name: Setup Conda
        uses: s-weigand/setup-conda@v1
        with:
          activate-conda: false
          conda-channels: conda-forge
      - name: Build environment
        shell: bash -l {0}
        run: |
          conda env create -f environment.yml
      - name: Execute Notebook
        run: |
          source activate ioos-btn
          jupyter nbconvert --to notebook --execute IOOS_BTN.ipynb --output=IOOS_BTN.ipynb
ocefpaf commented 2 years ago

I never tried to overwrite the same notebook, need to test it first. The rest looks good. BTW, are you saving some results and publishing? What is the format? You can probably create a table or a simple page and publish as gh-pages to keep the notebook untouched.

MathewBiddle commented 2 years ago

BTW, are you saving some results and publishing? What is the format?

Yeah, I'm saving the results as a csv file, similar to what I've done for the GTS metrics [1] which get collected quarterly with my GH Action [2].

[1] https://github.com/MathewBiddle/ioos_by_the_numbers/tree/main/gts [2] https://github.com/MathewBiddle/ioos_by_the_numbers/blob/main/.github/workflows/metrics.yml

We're also looking to do something similar for the NGDAC metrics.

So, if you have ideas on how to make this more efficient/prettier, I'm all ears.

We can chat on Thursday about it too.

ocefpaf commented 2 years ago

So, if you have ideas on how to make this more efficient/prettier, I'm all ears.

More efficient? No, what you are doing is the best as far as I know. Prettier? Yes. We should keep the csv, b/c that is more flexible, but save an HTML and table to post as gh-pages. That is a single line with pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_html.html.

MathewBiddle commented 1 year ago

I've been considering moving IOOS_BTN.ipynb [1] to a series of functions, again, for each of the metrics and wrap that all in a standalone python script. That way we could call to something like:

import ioos_metrics

df_btn = ioos_metrics.btn() # by the numbers as a df
df_atn_gts = ioos_metrics.atn_gts() # atn gts metrics as a df

ngdac_glider_days = ioos_metrics.ngdac.glider_days() # give back the number of glider days. Maybe expand to accept start/end

The Jupyter Notebook it becoming a little unwieldly at this point IMO.

[1] - https://github.com/MathewBiddle/ioos_by_the_numbers/blob/main/IOOS_BTN.ipynb