mi3nts / mints-aq-reports

Repository for generation of MINTS automated reports
https://mi3nts.github.io/mints-aq-reports/
1 stars 2 forks source link

Data Storage/Access and Notebook Execution Plan #8

Closed john-waczak closed 1 year ago

john-waczak commented 1 year ago

@davidlary @lakithaomal @mghpcsim (if you have any ideas, please add them below)

We need to figure out an appropriate way to access the data for our daily analyses. For now, I have copied some historical data for central node 8 to OSN to allow us to develop our anlaysis notebooks, but we will need a long term solution to be able to access the new data as it becomes available. Per our previous discussions, I think the current idea is to:

  1. rclone our CSV data to OSN from mfs so that it is easily accessible anywhere
  2. generate a set of summary.csv files for each node for each relevant time scale to reduce query volume for weekly/monthly/annual analyses.
  3. Set up notebooks to query sensor data

We may want to do a rclone for all of the historical data anyways so we can have it on OSN and get a sense for the current data volume (would probably be helpful for the AWS efforts). If the total size is only around 8 Tb so far, my current allocation should be sufficient but we may need to request an increase in space.

@lakithaomal, @davidlary If we do end up rclone'ing the csv's to OSN, I suggest we take this opportunity to clean up the file naming conventions. Currently, the CSV files are partitioned by device mac address, not device name. I think this makes browsing the data harder than it needs to be as what we really want is all of the sensors that are together on each Node.