openclimatefix / nwp

Tools for downloading and processing numerical weather predictions
MIT License
10 stars 3 forks source link

Consolidating this repo with NWP-consumer #34

Open jacobbieker opened 12 months ago

jacobbieker commented 12 months ago

Both this repo and the nwp consumer can pull down and process nwp data, and on the surface do very similar work, although the nwp consumer does have a lot more features. Should we consolidate on nwp consumer, and just add the extra functionality there, and archive this repo and project? @devsjc

Things that we want to probably add somewhere soon:

  1. ICON archiving that is usable in Dagster
  2. ARPEGE global forecast archiving and conversion to Zarr
  3. Global Deterministic Prediction System (Canada) forecast archive access (there is a publicly available archive of its forecasts) and conversion to Zarr
  4. ARPEGE Europe forecast archiving and conversion to Zarr.

One difference between this and the consumer is that the code here also works for converting downloaded data to zarr format, while the consumer is more focused on live downloading, I think. Potentially we should add the ability for the consumer to run the zarr conversion off of local paths as well? As we have the ARPEGE forecasts locally on Leonardo, and icon data for some of the days that are currently missing from Huggingface. Or do we keep that functionality here, and maybe set this up a bit better for working with local archived datasets and non live data?

devsjc commented 11 months ago

I think it's sensible to consolidate. There is the capability to convert downloaded data via the consumer using the convert command, however it is a little less flexible in terms of how it expects local data to be arranged. Potentially there just needs to be another entrypoint in the consumer that can arrange data it hasn't downloaded in the format it expects - but it's also true that it can only handle data from sources that it is aware of due to differences in the data split across raw files etc.

Item 1 on your checklist I will work on this month so hopefully we can tick that off, 2-4 can come after?

jacobbieker commented 11 months ago

Okay, yeah, sounds good to me. I like having it all in one place, and yeah, makes sense. 1 is the most pressing, we are archiving the raw data for 2-4 already on Leonardo, although for GDPS there might be a better archive here https://github.com/julemai/CaSPAr to use (you do have to manually request data though, but it does go back to 2017).

devsjc commented 11 months ago

Tracking 1 here https://github.com/openclimatefix/dagster-dags/issues/38