jbusecke / pangeo-forge-esgf

Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge
Apache License 2.0
6 stars 4 forks source link

dynamic params and facets #4

Closed larsbuntemeyer closed 7 months ago

larsbuntemeyer commented 1 year ago

This is just adding my first experience with ESGF CORDEX API. It's basically just adapting request parameters and facets for different ESGF projects.

larsbuntemeyer commented 1 year ago

Thanks for your all your comment @jbusecke ! I really appreciate it! I'll clean this up now and let you know when i am ready to merge!

jbusecke commented 1 year ago

This looks really cool. Ultimately I wonder if we should substantially refactor this logic and make it more object oriented. Just thinking out loud: Should we have something like this:

class ESGFBuilder:
    def __init__(project, ):
        ...defines class properties like the facets from project type via the request you outlined above...

    ... set up the connection to the server for the duration of the session...
    ...logic that retrieves urls etc...
    ...method that returns recipes...

I will have to invest some more time into this refactor, and wrapping my mind around async, but I just wanted to brain dump this here in case you had some input.

I really like the direction this is going for, generalizing this tool for different ESGF projects!

larsbuntemeyer commented 1 year ago

Sorry, i am not sure if this added unnecessary commits...

jbusecke commented 1 year ago

Great progress on this. Thank you so much for the continued effort. I am a bit concerned that we need some very basic tests here (I actually have introduced a basic test setup in #8). I think we should have at least a set of tests for e.g. CMIP6/CMIP5/CORDEX to ensure that for a dummy dataset of the same dimensions we are getting the same answer out of a call to response_data_processing? Happy to help with this in the coming days if that would be useful.

Alternatively I could work on another PR that implements such a high level test for CMIP6 only, you could merge that PR and then add tests for CORDEX here?

larsbuntemeyer commented 1 year ago

I would really appreciate some tests, so I am very grateful for #8 . Concerning further tests, I think I could probably adapt CMIP6 tests for CORDEX if you like.

I have to admit, i am never really sure whether to rebase or merge the main branch after it has been updated. I know the difference, but would you like to keep all commits in them main? Or would you squash the PR? I would be grateful for some advice so i don't add unnecessary merge commits...

jbusecke commented 1 year ago

I have to admit, i am never really sure whether to rebase or merge the main branch after it has been updated. I know the difference, but would you like to keep all commits in them main?

Did you merge or rebase in the first place? I am also always confused about this situation TBH. Ill check a bit more how to resolve this as my go-to solution (make another PR) is not really satisfactory...

Or would you squash the PR? I would be grateful for some advice so i don't add unnecessary merge commits...

I usually squash all commits in the PR, this was mostly for the sake of easier reviewing.

jbusecke commented 1 year ago

I just found this, but I am not sure if a force push will destroy our exchanges here. Overall I think the tests are a priority here. I would suggest to add these first and then try to clean up the commit history for a final review? Thank you again for the work and sorry for the delay.

larsbuntemeyer commented 1 year ago

Yes, agreed! I'll have some time for this at the end of the week, then i will try to clean up a little bit. Thanks for all your inputs!

larsbuntemeyer commented 1 year ago

Just some thoughts here for myself: I have to rethink the mip keyword, it does not make sense. It should be project which is the API keyword. There is an additional complication in the CORDEX project since there is an additional project keyword in the facet search, e.g., there can be project=cordex where the project facet is hardcode in the dataset_id_template:

"id":"cordex.output.EUR-44.MPI-CSC.MPI-M-MPI-ESM LR.historical.r1i1p1.REMO2009.v1.mon.tas.v20150609|esgf1.dkrz.de",
"dataset_id_template_":["cordex.%(product)s.%(domain)s.%(institute)s.%(driving_model)s.%(experiment)s.%(ensemble)s.%(rcm_name)s.%(rcm_version)s.%(time_frequency)s.%(variable)s"]

or project=cordex-reklies where the project facet can be parsed:

"id":"cordex-reklies.output.EUR-11.GERICS.MIROC-MIROC5.historical.r1i1p1.REMO2015.v1.mon.tas.v20170329|esgf1.dkrz.de",
 "dataset_id_template_":["%(project)s.%(product)s.%(domain)s.%(institute)s.%(driving_model)s.%(experiment)s.%(ensemble)s.%(rcm_name)s.%(rcm_version)s.%(time_frequency)s.%(variable)s"]

I have to make sure to search all cordex project not only project=cordex...

larsbuntemeyer commented 1 year ago

More thoughts:

If i chose project=CORDEX-Reklies, the product facet gets inactive: grafik That's really weired since it is still part of the dataset_id.... hmmm.

It works now, but required some ugly exceptions. Have to clean it up some more.

jbusecke commented 1 year ago

If i chose project=CORDEX-Reklies, the product facet gets inactive:

Is there a value when you query the API directly? I wonder if this is some glitch in the GUI

Also is there a list of all available projects? I have to admit we are moving into somewhat unfamiliar territory for me here, and I hope that having a full list will give us an idea how to most efficiently generalize this.

larsbuntemeyer commented 1 year ago

I'll try to remove some of the if statements that are hacked in depending on the project. I think, this could then be more generalized. The ESGF CORDEX project is not always as consistent as it could be, that's why there might be come exceptions. Sorry for the late responses here, i had to dig through some old grumpy Fortran code the last week... :-/

larsbuntemeyer commented 1 year ago

Although the recipe run works locally, feedstock creation is not yet possible due to https://github.com/pangeo-forge/pangeo-forge-recipes/issues/79 and https://github.com/pangeo-forge/pangeo-forge-recipes/issues/418.

jbusecke commented 1 year ago

Hey @larsbuntemeyer just checking in here (sorry for the long silence). I just wanted to make you aware of some new developments in PGF-recipes (see https://github.com/jbusecke/pangeo-forge-esgf/pull/9#issuecomment-1655640407) that in my mind now completely eliminate the need for the dynamic_kwargs.py module. I am planning to fully deprecate this logic and move towards handling those issues in the recipe generation itself.

I have started (and will continue) to work on #15 which reflects these plans. I would be curious if you agree that the upstream changes eliminate your need for the dynamic_kwargs module as well?

larsbuntemeyer commented 10 months ago

Hey @jbusecke , thanks for looking back into this after all! Sorry for the long absence, i had to kick some model runs regularly which took all my attention! I agree, that the refactoring makes this redundant. I also have to get myself up to date with the Beam refactor in PGF now.

jbusecke commented 8 months ago

Hey @larsbuntemeyer just digging out of a mountain of gh notifications and found this. I have quite substantially refactored this package since the new pangeo-forge version (based on apache beam) became available.

Please let me know if you are still interested in this, or have (understandibly) moved on.

larsbuntemeyer commented 8 months ago

Hey @jbusecke thanks for the updates! I have been following a bit your work and just watched your Pangeo showcase recently, great work! I have been moving on a bit with pange-forge-cordex to ingest CORDEX data on AWS.

However, this is all based on pangeo-forge-recipes < 0.10.0, so instead of refactoring that, i will check out your work and see how far i get. But a new PR will definitely make sense, so this one should be closed!