pangeo-forge / cmip6-feedstock

A Pangeo Forge Feedstock for cmip6.
Apache License 2.0
3 stars 2 forks source link

Add PMIP runs to recipe #16

Closed jbusecke closed 2 years ago

jbusecke commented 2 years ago

This PR superseeds #14, and is already rebased on #15. For previous discussions please refer to https://github.com/pangeo-forge/cmip6-feedstock/pull/14

It passed locally for me, lets see if we can actually build the dataset.

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha 55337f3db0ac54e429a3f33fb1935da4a239e2b6:

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha bf72d5fc47eb8be36651d1b009f5fc1cb007c7dd:

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=820

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.dummy is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/820

pangeo-forge-bot commented 2 years ago

:partying_face: Hooray! The test execution of your recipe CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.dummy succeeded.

Here is a static representation of the dataset built by this recipe:

``` Dimensions: (lat: 96, bnds: 2, lon: 192, time: 480) Coordinates: height float64 ... * lat (lat) float64 -88.57 -86.72 -84.86 -83.0 ... 84.86 86.72 88.57 * lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1 * time (time) object 7001-01-16 12:00:00 ... 7040-12-16 12:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (lat, bnds) float64 dask.array lon_bnds (lon, bnds) float64 dask.array tas (time, lat, lon) float32 dask.array time_bnds (time, bnds) object dask.array Attributes: (12/47) CDO: Climate Data Operators version 2.0.0rc2 (https://... Conventions: CF-1.7 CMIP-6.2 activity_id: PMIP branch_method: no parent branch_time_in_child: 1881364.0 branch_time_in_parent: 0.0 ... ... table_id: Amon table_info: Creation Date:(09 May 2019) MD5:5f007c16960eee824... title: MPI-ESM1-2-LR output prepared for CMIP6 tracking_id: hdl:21.14100/03ca260c-951f-484e-b0d0-ce7d31509f84 variable_id: tas variant_label: r1i1p1f1 ```

You can also open this dataset by running the following Python code

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-820/pangeo-forge/cmip6-feedstock/CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.dummy.zarr'
mapper = fsspec.get_mapper(dataset_public_url)
ds = xr.open_zarr(mapper, consolidated=True)
ds

in this badge (or your Python interpreter of choice).

Checklist

Please copy-and-paste the list below into a new comment on this thread, and check the boxes off as you've reviewed them.

Note: This test execution is limited to two increments in the concatenation dimension, so you should expect the length of that dimension (e.g, "time" or equivalent) to be 2.

- [ ] Are the dimension lengths correct?
- [ ] Are all of the expected variables present?
- [ ] Does plotting the data produce a plot that looks like your dataset?
- [ ] Can you run a simple computation/reduction on the data and produce a plausible result?
jbusecke commented 2 years ago

@CommonClimate could you check the output, and see if these look ok to you? Ill run the others now.

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=821

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=822

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe CMIP6.PMIP.MRI.MRI-ESM2-0.past1000.r1i1p1f1.Amon.tas.gn.dummy is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/821

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe CMIP6.PMIP.MIROC.MIROC-ES2L.past1000.r1i1p1f2.Amon.tas.gn.dummy is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/822

pangeo-forge-bot commented 2 years ago

Pangeo Forge Cloud told me that our test of your recipe CMIP6.PMIP.MRI.MRI-ESM2-0.past1000.r1i1p1f1.Amon.tas.gn.dummy failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/821

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

pangeo-forge-bot commented 2 years ago

Pangeo Forge Cloud told me that our test of your recipe CMIP6.PMIP.MIROC.MIROC-ES2L.past1000.r1i1p1f2.Amon.tas.gn.dummy failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/822

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

jbusecke commented 2 years ago

The logs say:

2022-07-01T14:45:33.649718+00:00 (INFO)

Submitted for execution: Job prefect-job-fad0a939

2022-07-01T14:57:08.484788+00:00 (INFO)

Rescheduled by a Lazarus process. This is attempt 1.

2022-07-01T14:57:11.170213+00:00 (INFO)

Submitted for execution: Job prefect-job-73b21ee5

2022-07-01T15:09:07.186314+00:00 (INFO)

Rescheduled by a Lazarus process. This is attempt 2.

2022-07-01T15:09:10.845308+00:00 (INFO)

Submitted for execution: Job prefect-job-9768abae

2022-07-01T15:21:16.937544+00:00 (INFO)

Rescheduled by a Lazarus process. This is attempt 3.

2022-07-01T15:21:18.624253+00:00 (INFO)

Submitted for execution: Job prefect-job-6c752e03

2022-07-01T15:32:22.22285+00:00 (ERROR)

A Lazarus process attempted to reschedule this run 3 times without success. Marking as failed.

I am not at all sure what this means (maybe @cisaacstern knows more). will retry one of the jobs.

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=822

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe CMIP6.PMIP.MIROC.MIROC-ES2L.past1000.r1i1p1f2.Amon.tas.gn.dummy is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/822

pangeo-forge-bot commented 2 years ago

Pangeo Forge Cloud told me that our test of your recipe CMIP6.PMIP.MIROC.MIROC-ES2L.past1000.r1i1p1f2.Amon.tas.gn.dummy failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/822

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

jbusecke commented 2 years ago

Same issue with the Lazarus process. I am going to wait for some input from @cisaacstern side here.

cisaacstern commented 2 years ago

Thanks for the ping Julius. I admittedly don't know what this is off the top of my head but I'll look into it.

jbusecke commented 2 years ago

Thanks Charles.

CommonClimate commented 2 years ago

The code snippet above works like a charm! I'm able to load and plot the data without a hitch, in just a couple of seconds: MPI_ESM1 2_zarr_mean

jbusecke commented 2 years ago

@cisaacstern should we try to rerun the recipe for the two failed runs or is that something that would need more time to fix?

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha 4d6ca67355547dbc7a46636cad53474609ae7600:

jbusecke commented 2 years ago

Seems like the 2k simulation is (possibly temporarily) unavailable. I will try to run one of the others one more time.

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=891

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe CMIP6.PMIP.MRI.MRI-ESM2-0.past1000.r1i1p1f1.Amon.tas.gn.dummy is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/891

jbusecke commented 2 years ago

From the logs it seems like the 'Lazarus' thing might have just been a temporary issue.

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=890

pangeo-forge-bot commented 2 years ago

:partying_face: Hooray! The test execution of your recipe CMIP6.PMIP.MRI.MRI-ESM2-0.past1000.r1i1p1f1.Amon.tas.gn.dummy succeeded.

Here is a static representation of the dataset built by this recipe:

``` Dimensions: (lat: 160, bnds: 2, lon: 320, time: 12000) Coordinates: height float64 ... * lat (lat) float64 -89.14 -88.03 -86.91 -85.79 ... 86.91 88.03 89.14 * lon (lon) float64 0.0 1.125 2.25 3.375 ... 355.5 356.6 357.8 358.9 * time (time) object 0850-01-16 12:00:00 ... 1849-12-16 12:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (lat, bnds) float64 dask.array lon_bnds (lon, bnds) float64 dask.array tas (time, lat, lon) float32 dask.array time_bnds (time, bnds) object dask.array Attributes: (12/36) Conventions: CF-1.7 CMIP-6.2 activity_id: PMIP branch_method: no parent cmor_version: 3.5.0 creation_date: 2020-01-04T03:36:49Z data_specs_version: 01.00.31 ... ... table_id: Amon table_info: Creation Date:(24 July 2019) MD5:c93735846d6645896... title: MRI-ESM2-0 output prepared for CMIP6 tracking_id: hdl:21.14100/39c0dcbe-e36a-4fae-900b-67b0874f5996 variable_id: tas variant_label: r1i1p1f1 ```

You can also open this dataset by running the following Python code

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-891/pangeo-forge/cmip6-feedstock/CMIP6.PMIP.MRI.MRI-ESM2-0.past1000.r1i1p1f1.Amon.tas.gn.dummy.zarr'
mapper = fsspec.get_mapper(dataset_public_url)
ds = xr.open_zarr(mapper, consolidated=True)
ds

in this badge (or your Python interpreter of choice).

Checklist

Please copy-and-paste the list below into a new comment on this thread, and check the boxes off as you've reviewed them.

Note: This test execution is limited to two increments in the concatenation dimension, so you should expect the length of that dimension (e.g, "time" or equivalent) to be 2.

- [ ] Are the dimension lengths correct?
- [ ] Are all of the expected variables present?
- [ ] Does plotting the data produce a plot that looks like your dataset?
- [ ] Can you run a simple computation/reduction on the data and produce a plausible result?
cisaacstern commented 2 years ago

Apologies for the delayed response here. Please let me know if/how I can be of assistance at this point.

CommonClimate commented 2 years ago

adding @jordanplanders to this conversation, as she will be taking this over for a bit

CommonClimate commented 2 years ago

Well @cisaacstern the question remains the same: what do we have have to do at this point? Go through the checklist and complete the review? Sorry we are new to this process, so we need a bit of hand-holding.

cisaacstern commented 2 years ago

Yes, someone with knowledge of the data opening it and answering the checklist provided in https://github.com/pangeo-forge/cmip6-feedstock/pull/16#issuecomment-1178001348 would be helpful. Beyond that, I will defer to @jbusecke who has been following this PR much more closely than me.

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha 6df9f298fbde1f0af3b35ac03852b68de5f91eea:

jbusecke commented 2 years ago

/run recipe-test recipe_run_id=984

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.v20210714 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/984

pangeo-forge-bot commented 2 years ago

Pangeo Forge Cloud told me that our test of your recipe CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.v20210714 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/984

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

jbusecke commented 2 years ago

Ughhh this does not seem a great day for ESGF. I am seeing a bunch of 503s and the one datasets that generates a recipe then fails with another server error:

FileNotFoundError(url) from exc FileNotFoundError: http://esgf3.dkrz.de/thredds/fileServer/cmip6/PMIP/MPI-M/MPI-ESM1-2-LR/past2k/r1i1p1f1/Amon/tas/gn/v20210714/tas_Amon_MPI-ESM1-2-LR_past2k_r1i1p1f1_gn_700101-702012.nc

Ill try to push to this some time later and see if we have more luck?

jordanplanders commented 2 years ago

@jbusecke I took the gn.dummy version for a spin and it worked fine, except it only returned the first two chunks. Running:

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-984/pangeo-forge/cmip6-feedstock/CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.v20210714.zarr'
mapper = fsspec.get_mapper(dataset_public_url)

ds_tmp = xr.open_zarr(mapper,  consolidated=True, decode_times=False)
ds_tmp

returned:

FileNotFoundError: https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-984/pangeo-forge/cmip6-feedstock/CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.v20210714.zarr/.zmetadata

It's probably user error (pardon), but maybe you can give me a nudge in the right direction?

jbusecke commented 2 years ago

First of all, hi Jordan! Its been a while since we were working on problem-sets for oceanography? Very good to see you here!

It's probably user error (pardon), but maybe you can give me a nudge in the right direction?

I actually think this is not on you. I think this particular run failed (for reasons I need to investigate). I just tried to open it and I think there is only a folder with nothing in it.

I am looking into this with @cisaacstern right now, and we will report back!

jordanplanders commented 2 years ago

Hi!! So it has!

I think I'm still pretty fuzzy about the backend of pangeo-forge, but I grabbed code from the cmip6-feedstock recipe.py and tried it on CMIP6.DAMIP.NOAA-GFDL.GFDL-ESM4.hist-aer.r1i1p1f1.Amon.pr.gr1.v20180701 and it identified all parts of the dataset correctly (when I compared it to the web ui), but when I tried it on CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.v20210714, it only returned 10 of 93 (Gist here). It's not obvious to me why that would cause it to fail per se, but I can imagine there might be a check that fails if the dataset end date doesn't correspond with the last date of the last block?

I wrote a little bit of code to pull together a url list for past2k, and expected it to break for the GFDL data (which is to say: I expected there was something weird about past2k that was fouling things up), but it didn't (I changed the stem url string appropriately).

Maybe something in all of this will help root out the issue...

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha 3da3eed2b4ab78a3a3b766587ce90e0b7fbe62f1:

jbusecke commented 2 years ago

Ah that is helpful, thanks.

This bit is interesting and warrants some more digging:

it only returned 10 of 93

I assume that is file urls in this case? If there is a discrepancy between the web search and the API based search, that might have to be raised upstream with the ESGF folks.

I will merge this one now, and see if the issue you describe persists. Ill investigate a bit further next week.

Overall @cisaacstern and I have discussed to split out the PMIP datasets into a separate feedstock (https://github.com/pangeo-forge/staged-recipes/pull/162) to keep things a bit more organized. None of that should for now make a difference with this issue, but it will enable us to treat your request more in isolation. Once we have that feedstock merged we can move dissussions over there.

jbusecke commented 2 years ago

I think I found the bug, actually. I previously did not set the limit on the request parameters (and I believe it defaults to 10! results or files in this case). I just changed it to 500 here and locally get the correct number of 93 files! Lets see if this fixes that issue.

jbusecke commented 2 years ago

So I was able to do this:

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/cmip6-feedstock/CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.past2k.r1i1p1f1.Amon.tas.gn.v20210714.zarr'
mapper = fsspec.get_mapper(dataset_public_url)

ds = xr.open_zarr(mapper,  consolidated=True, decode_times=False)
ds

and get

image

Does that look about right to you?

jordanplanders commented 2 years ago

Yes! Thank you!

CommonClimate commented 2 years ago

Looks dandy, thank you. I support moving this to a different recipe if it simplifies your workflow - just let us know where to follow this discussion.

jbusecke commented 2 years ago

We are still working out some kinks. I will definitely notify you here once the feedstock is generated. Thanks for the patience!