jbusecke / pangeo-forge-esgf

Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge
Apache License 2.0
6 stars 4 forks source link

Add support for daily output frequency #9

Closed duncanwp closed 1 year ago

duncanwp commented 1 year ago

I needed to support the daily table_id for my climatebench-feedstock so have made a first pass at fixing https://github.com/jbusecke/pangeo-forge-esgf/issues/3 here.

Happy for any feedback / other approaches.

duncanwp commented 1 year ago

Great, happy to add a test or two - but there doesn't seem to be a framework. How do you want to set it up? Would it make sense as a separate PR?

duncanwp commented 1 year ago

I don't have an explicit list sorry but I extract all NorESM2-LM files for all ensemble members matching the following experiments and variables:

experiments = [
               '1pctCO2', 'abrupt-4xCO2', 'historical', 'piControl', # CMIP
               'hist-GHG', 'hist-aer', # DAMIP
               'ssp126', 'ssp245', 'ssp370', 'ssp370-lowNTCF', 'ssp585' #   ScenarioMIP
]
variables = [
             'tas', 'tasmin', 'tasmax', 'pr'
]

See the original (hacky!) script here: https://github.com/duncanwp/ClimateBench/blob/main/prepare_data.py

duncanwp commented 1 year ago

bump

duncanwp commented 1 year ago

bump

jbusecke commented 1 year ago

Hi @duncanwp, sorry for the radio silence here. I am just now returning to this work in my context of transforming CMIP data from ESGF. I think that some work I have started upstream in pangeo-forge-recipes could entirely eliminate the need for dynamic kwargs generation in pangeo-forge-esgf. This would be really nice since the issue is solved on a much more general level. The new beam-refactor enables a lot more flexibility. If you are ok with this I am tending towards moving this discussion/feature further upstream?

duncanwp commented 1 year ago

No worries, sounds good. I'd still be keen to look at daily ClimateBench variables as a potential use-case so would appreciate a pointer on where to get started with that!

jbusecke commented 1 year ago

I am also still very curious about that use case @duncanwp, and I truly appreciate your patience. I think we can try to support that from the LEAP data-ingestion side (or at least focus discussions there). Could you raise an issue here so we can discuss this usecase with respect to the new beam refactor (hopefully entraining @cisaacstern in the process 😄).

FYI here is some more info on LEAP and how we are trying to handle data ingestion (even though your use case is more of a derived data product than a canonical ingestion): https://leap-stc.github.io/leap-pangeo/jupyterhub.html#i-have-a-dataset-and-want-to-work-with-it-on-the-hub-how-do-i-upload-it

Either way having an outline of your plan there would be very helpful to keep things focused.

duncanwp commented 1 year ago

Sure, sounds good! I just created an issue over there: https://github.com/leap-stc/data-management/issues/43

jbusecke commented 1 year ago

Fantastic. Thank you so much.