Open MathewBiddle opened 11 months ago
I think what I would like to see is all of the data used in the metrics website are made available through the IOOS ERDDAP. Then, lightweight scripts that bring the data in and make the webpage. The trouble I have is where to put the scripts used to generate the datasets that then get served on the IOOS ERDDAP?
current flow
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#007396',
'primaryTextColor': '#fff',
'primaryBorderColor': '#003087',
'lineColor': '#003087',
'secondaryColor': '#007396',
'tertiaryColor': '#CCD1D1'
},
'flowchart': { 'curve': 'basis' }
}
}%%
flowchart LR
pA["html"]
A["gts_atn_metrics.py"]
B["GTS_ATN_monthly_totals.csv"]
C["create_gts_atn_landing_page.py"]
subgraph ATN
A --> pA
A --> B
C --> B
end
D["ioosstats/"]
E["gts_regional_metrics.py"]
F["ioos_metrics/tree/main/gts/"]
G["create_gts_regional_landing_page.py"]
subgraph GTS
E --> D
E --> F
G --> F
end
H["inventory_creation.ipynb"]
I["ioos-asset-inventory/tree/main"]
J["IOOS ERDDAP"]
K["create_asset_inventory_page.py"]
subgraph inventory
H --> I
I --> J
K --> J
end
I think it should look like
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#007396',
'primaryTextColor': '#fff',
'primaryBorderColor': '#003087',
'lineColor': '#003087',
'secondaryColor': '#007396',
'tertiaryColor': '#CCD1D1'
},
'flowchart': { 'curve': 'basis' }
}
}%%
flowchart LR
L["ERDDAP"]
A["gts_atn_metrics.py"]
C["create_gts_atn_landing_page.py"]
E["gts_regional_metrics.py"]
G["create_gts_regional_landing_page.py"]
H["inventory_creation.ipynb"]
K["create_asset_inventory_page.py"]
E --> L
L --> G
A --> L
L --> C
H --> L
L --> K
I am also calculating IOOS by the Numbers in this notebook https://github.com/ioos/ioos_metrics/blob/main/IOOS_BTN.ipynb which writes to a csv file https://github.com/ioos/ioos_metrics/blob/main/ioos_btn_metrics.csv. I would like to define a process for running that notebook (or the code inside) and then write the data that could then be hosted on the IOOS ERDDAP https://erddap.ioos.us/erddap/index.html
related to #8
💡 for 2.a. (https://github.com/ioos/ioos_metrics/issues/35#issue-1845474591) I could run that shell script as a cron job on AWS. Then, we don't have to worry about the ERDDAP endpoint getting out of sync by forgetting to pull new data. Probably best to run on the 5th of each month...
setup the cron job:
$ crontab -l
0 12 5 * * get_data.sh
Will need to check on the 5th of the month if it ran.
I could run that shell script as a cron job on AWS
Could it work as a GHA cronjob? Or there are reasons to no go that route?
I need the data where ERDDAP can access it. And https://erddap.ioos.us/erddap/index.html is currently on AWS.
This process is very confusing ATM. I've tried to update the README to document how to update the webpages. However, there are lots of interweaving dependencies and step-wise processes that need to be executed in a specific way to make everything work.
I'm starting this issue to do two things.
What is happening now:
git pull
of ioos-asset-inventory repo on ERDDAP server.How can this process be simplified to: