hytest-org / hytest

https://hytest-org.github.io/hytest/
22 stars 12 forks source link

Build rechunking tutorial #277

Open amsnyder opened 1 year ago

amsnyder commented 1 year ago

@rsignell-usgs is building a rechunking tutorial with the following structure: (need to populate)

@gzt5142 is making sure the content populates a JupyterBook nicely. Current draft is here: https://gzt5142.github.io/DaskDataChunking/

Once the tutorial/book is complete, we will decide how to link to or incorporate it into the HyTEST JupyterBook.

amsnyder commented 1 year ago

@rsignell-usgs will be building the tutorial on NOAA GEFS retrospective data. The first two steps of his workflow (create individual file jsons and create consolidated metadata json/parquet file) will be redundant to what PUMP (@ted80810, @wdwatkins) has already done, so Rich won't need to actually run this code - just run it on a sample of files to test and make sure it works.

Rich will actually build out the steps of the workflow to rechunk the data from the consolidated metadata file because PUMP has not worked on this part yet. We can provide the rechunked dataset to PUMP as a value-added substitution when it is ready.

amsnyder commented 1 year ago

Tasks: