pochedls / xagg

Software to create xml links to underlying CMIP netCDF data
1 stars 1 forks source link

Duplicate ScenarioMIP data? #29

Closed durack1 closed 4 years ago

durack1 commented 4 years ago

@pochedls I was just starting to take a look at the various MIP datasets, and stumbled upon

(base) bash-4.2$ ls -al ../xclim/CMIP6/CMIP/
total 0
drwxrwxr-x 15 pochedls xclimw 4096 Oct  4  2019 .
drwxrwxr-x 12 pochedls xclimw 4096 Jun  1 17:48 ..
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 1pctCO2
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 abrupt-4xCO2
drwxrwxr-x  8 pochedls xclimw 4096 Sep 13  2019 amip
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 esm-hist
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 esm-piControl
drwxrwxr-x  9 pochedls xclimw 4096 Jul 13 04:04 esm-piControl-spinup
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 historical
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 piControl
drwxrwxr-x  9 pochedls xclimw 4096 Jul 13 04:04 piControl-spinup
drwxrwxr-x  5 pochedls xclimw 4096 Oct  4  2019 ssp126
drwxrwxr-x  5 pochedls xclimw 4096 Oct  4  2019 ssp245
drwxrwxr-x  5 pochedls xclimw 4096 Jul 30 05:35 ssp370
drwxrwxr-x  4 pochedls xclimw 4096 Oct  4  2019 ssp585
(base) bash-4.2$ ls -al ../xclim/CMIP6/ScenarioMIP/
total 0
drwxrwxr-x 10 pochedls xclimw 4096 May 14  2019 .
drwxrwxr-x 12 pochedls xclimw 4096 Jun  1 17:48 ..
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp119
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp126
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp245
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp370
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp434
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp460
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp534-over
drwxrwxr-x  9 pochedls xclimw 4096 Sep 13  2019 ssp585

Is the duplication of ssp126 - > ssp585 intentional?

pochedls commented 4 years ago

@durack1 - This is extracted from the path. It appears this only applies to FGOALS-f3-L and FGOALS-g3. I'm not sure if the problem is on our end or in how FGOALS saved their data. I think the CMIP xmls point to scratch and the ScenarioMIP xmls point to publish (in the subset I looked at).

durack1 commented 4 years ago

@taylor13 and @sashakames this might involve WIP/ESGF as we may have discovered an issue, which may need an errata raised.

Taking a peek at the errata, we have no entries for FGOALS-f3-L or for FGOALS-g3

taylor13 commented 4 years ago

Spot checking on ESGF, it appears that all the metadata has ScenarioMIP, not CMIP, so it might be a problem with the directory structure on scratch?

durack1 commented 4 years ago

Ok so for a randomly selected incorrect-MIP identified file we have:

-bash-4.2$ more ../xclim/CMIP6/CMIP/ssp585/atmos/mon/tas/CMIP6.CMIP.ssp585.CAS.FGOALS-g3.r1i1p1f1.mon.tas.atmos.glb-z1-gn.v20190818.0000000.0.xml | grep directory
    directory   ="../scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp585/r1i1p1f1/Amon/tas/gn/v20190818/"

(cdat821rc1py3) bash-4.2$ ncdump -h ../scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp585/r1i1p1f1/Amon/tas/gn/v20190818/tas_Amon_FGOALS-g3_ssp585_r1i1p1f1_gn_201501-201912.nc 
netcdf tas_Amon_FGOALS-g3_ssp585_r1i1p1f1_gn_201501-201912 {
dimensions:
    time = UNLIMITED ; // (60 currently)
    lat = 80 ;
    lon = 180 ;
    bnds = 2 ;
variables:
...

// global attributes:
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :activity_id = "ScenarioMIP" ;
        :branch_method = "standard" ;
        :branch_time_in_child = 0. ;
        :branch_time_in_parent = 60225. ;
        :contact = "Lijuan Li (ljli@mail.iap.ac.cn)" ;
        :creation_date = "2019-08-18T13:08:09Z" ;
        :data_specs_version = "01.00.31" ;
        :experiment = "update of RCP8.5 based on SSP5" ;
        :experiment_id = "ssp585" ;
        :external_variables = "areacella" ;
        :forcing_index = 1 ;
        :frequency = "mon" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.CAS.FGOALS-g3.ssp585.none.r1i1p1f1" ;
        :grid = "native atmosphere area-weighted latxlon grid (80x180 latxlon)" ;
        :grid_label = "gn" ;
        :history = "2019-08-18T13:04:31Z ;rewrote data to be consistent with ScenarioMIP for variable cl found in table Amon." ;
        :initialization_index = 1 ;
        :institution = "Chinese Academy of Sciences, Beijing 100029, China" ;
        :institution_id = "CAS" ;
        :mip_era = "CMIP6" ;
        :nominal_resolution = "250 km" ;
        :parent_activity_id = "CMIP" ;
        :parent_experiment_id = "historical" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "FGOALS-g3" ;
        :parent_time_units = "days since 1850-01-01" ;
        :parent_variant_label = "r1i1p1f1" ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realization_index = 1 ;
        :realm = "atmos" ;
        :run_variant = "3rd realization" ;
        :source = "FGOALS-g3 (2017): \n",
            "aerosol: none\n",
            "atmos: GAMIL2 (180 x 90 longitude/latitude; 26 levels; top level 2.19hPa)\n",
            "atmosChem: none\n",
            "land: CLM4.0\n",
            "landIce: none\n",
            "ocean: LICOM3.0 (LICOM3.0, tripolar primarily 1deg; 360 x 218 longitude/latitude; 30 levels; top grid cell 0-10 m)\n",
            "ocnBgchem: none\n",
            "seaIce: CICE4.0" ;
        :source_id = "FGOALS-g3" ;
        :source_type = "AOGCM" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Amon" ;
        :table_info = "Creation Date:(24 July 2019) MD5:3039b0071259358b3c55557c5f3d21bf" ;
        :title = "FGOALS-g3 output prepared for CMIP6" ;
        :tracking_id = "hdl:21.14100/03609f1e-62da-4fee-996f-c41f8a2488d3" ;
        :variable_id = "tas" ;
        :variant_label = "r1i1p1f1" ;
        :license = "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :cmor_version = "3.5.0" ;
}

So it looks to me like the metadata/global attributes are correct, but the path is not, which means we may need to redirect the scans to the global atts to get around these inconsistencies with problem paths

The above was also confirmed with

../xclim/CMIP6/CMIP/ssp245/atmos/mon/pr/CMIP6.CMIP.ssp245.CAS.FGOALS-g3.r1i1p1f1.mon.pr.atmos.glb-2d-gn.v20190818.0000000.0.xml
durack1 commented 4 years ago

I just took a look at the alternative model FGOALS-f3-L and we get

(cdat821rc1py3) bash-4.2$ more ../xclim/CMIP6/CMIP/ssp126/land/mon/mrfso/CMIP6.CMIP.ssp126.CAS.FGOALS-f3-L.r1i1p1f1.mon.mrfso.land.glb-2d-gn.v20190821.0000000.0.xml | grep directory
    directory   ="../esgf_publish/CMIP6/CMIP/CAS/FGOALS-f3-L/ssp126/r1i1p1f1/Lmon/mrfso/gn/v20190821/"
(cdat821rc1py3) bash-4.2$ ncdump -h ../esgf_publish/CMIP6/CMIP/CAS/FGOALS-f3-L/ssp126/r1i1p1f1/Lmon/mrfso/gn/v20190821/mrfso_Lmon_FGOALS-f3-L_ssp126_r1i1p1f1_gn_201501-210012.nc 
netcdf mrfso_Lmon_FGOALS-f3-L_ssp126_r1i1p1f1_gn_201501-210012 {
dimensions:
    time = UNLIMITED ; // (1032 currently)
    lat = 192 ;
    lon = 288 ;
    bnds = 2 ;
variables:
...

// global attributes:
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :activity_id = "ScenarioMIP" ;
        :branch_method = "standard" ;
        :branch_time_in_child = 59400. ;
        :branch_time_in_parent = 59400. ;
        :creation_date = "2019-08-21T02:01:46Z" ;
        :data_specs_version = "01.00.30" ;
        :experiment = "update of RCP2.6 based on SSP1" ;
        :experiment_id = "ssp126" ;
        :external_variables = "areacella" ;
        :forcing_index = 1 ;
        :frequency = "mon" ;
        :grid = "native atmosphere regular grid (3x4 latxlon)" ;
        :grid_label = "gn" ;
        :initialization_index = 1 ;
        :institution = "Chinese Academy of Sciences, Beijing 100029, China" ;
        :institution_id = "CAS" ;
        :mip_era = "CMIP6" ;
        :nominal_resolution = "10000 km" ;
        :parent_activity_id = "CMIP" ;
        :parent_experiment_id = "historical" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "FGOALS-f3-L" ;
        :parent_time_units = "days since 2015-01-01" ;
        :parent_variant_label = "r1i1p1f1" ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realm = "land" ;
        :run_variant = "3rd realization" ;
        :source = "FGOALS-f3-L (2017): \n",
            "aerosol: none\n",
            "atmos: FAMIL2.2 (Cubed-sphere, c96; 360 x 180 longitude/latitude; 32 levels; top level 2.16 hPa)\n",
            "atmosChem: none\n",
            "land: CLM4.0\n",
            "landIce: none\n",
            "ocean: LICOM3.0 (LICOM3.0, tripolar primarily 1deg; 360 x 218 longitude/latitude; 30 levels; top grid cell 0-10 m)\n",
            "ocnBgchem: none\n",
            "seaIce: CICE4.0" ;
        :source_id = "FGOALS-f3-L" ;
        :source_type = "AOGCM ISM AER" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Lmon" ;
        :table_info = "Creation Date:(09 May 2019) MD5:cde930676e68ac6780d5e4c62d3898f6" ;
        :title = "FGOALS-f3-L output prepared for CMIP6" ;
        :tracking_id = "hdl:21.14100/5c5a98cc-aab9-420f-9ded-cc5ac35931c2" ;
        :variable_id = "mrfso" ;
        :license = "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :cmor_version = "3.4.0" ;
        :variant_label = "r1i1p1f1" ;
        :realization_index = "1" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.CAS.FGOALS-f3-L.ssp126.none.r1i1p1f1" ;
        :history = "Thu Sep 26 09:19:19 2019: ncatted -O -a further_info_url,global,m,c,https://furtherinfo.es-doc.org/CMIP6.CAS.FGOALS-f3-L.ssp126.none.r1i1p1f1 mrfso_Lmon_FGOALS-f3-L_ssp126_r1i1p1f1_gn_201501-210012.nc\n",
            "Thu Sep 26 09:02:47 2019: ncatted -O -a further_info_url,global,m,c,https://furtherinfo.es-doc.org/CMIP6.CAS.FGOALS-f3-L.1pctCO2.none.r1i1p1f1 mrfso_Lmon_FGOALS-f3-L_ssp126_r1i1p1f1_gn_201501-210012.nc\n",
            "Thu Sep 26 09:02:34 2019: ncatted -O -a realization_index,global,m,c,1 mrfso_Lmon_FGOALS-f3-L_ssp126_r1i1p1f1_gn_201501-210012.nc\n",
            "Thu Sep 26 09:02:22 2019: ncatted -O -a variant_label,global,m,c,r1i1p1f1 mrfso_Lmon_FGOALS-f3-L_ssp126_r1i1p1f1_gn_201501-210012.nc\n",
            "2019-08-21T02:01:46Z ;rewrote data to be consistent with ScenarioMIP for variable mrfso found in table Lmon." ;
}

So same story, metadata correct, paths not

durack1 commented 4 years ago

Hawkeye @taylor13 may have eagle-eye-spotted a problem with the file directly above, 3 guesses, starts, .... now.

taylor13 commented 4 years ago

took about 3 minutes .... The branch time in child is inconsistent with the units and file name, which indicate it should be near 0, not 59400.

taylor13 commented 4 years ago

also parent_time_units are wrong.

durack1 commented 4 years ago

Good catches, but not the issue I was eyeing off:

:nominal_resolution = "10000 km" ; doesn't fit too well with the grid atmos: FAMIL2.2 (Cubed-sphere, c96; 360 x 180 longitude/latitude;, it's probably more like 100 km

taylor13 commented 4 years ago

also the nominal_resolution is too large

taylor13 commented 4 years ago

too late, I guess.

durack1 commented 4 years ago

It was a third guess, I suppose you slipped just under the cutoff

durack1 commented 4 years ago

@pochedls if WE were to implement this dir-scour to metadata-scour change, we'd likely need to archive and then rerun the whole tree. How's the appetite for such an undertaking, and how many days are we talking here?

pochedls commented 4 years ago

This problem is upstream of xagg (as far as I can tell) and in the end doesn't cause any problems (the published datasets have xml files in the correct place).

I don't think it is worth investing time in changing xagg for two reasons: 1) I don't think this is a problem for anyone using the xmls (again, the xml files corresponding to the published data are in the correct location) and 2) I think if we infer the activity from the netcdf files xagg will be substantially slower and I think re-factoring the code may lead to other (potentially more substantive) problems that will take time to resolve.

I will mark these files as ignored and we can revisit this issue if it does cause legitimate problems in accessing the correct data. Let me know if you object.

pochedls commented 4 years ago

FYI - that this issue appears to affect 50 directories. Also note that by "ignoring" the dataset, it will remove the xml and not scan it in the future.

Relevant query:

select path from paths where xmlFile like '/p/user_pub/xclim/CMIP6/CMIP/%' and experiment = 'ssp585';

pochedls commented 4 years ago

Looking at all of the misplaced scenarioMIP data, there are 207 datasets (50 are in ssp585). Of them, all but five have a corresponding dataset in the correct location (e.g., under scenarioMIP rather than CMIP) except for these five (xml file followed by the underlying directory):

/p/user_pub/xclim/CMIP6/CMIP/ssp585/atmos/mon/mc/CMIP6.CMIP.ssp585.CAS.FGOALS-g3.r1i1p1f1.mon.mc.atmos.glb-l-gn.v20190818.0100000.0.xml /p/css03/scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp585/r1i1p1f1/Amon/mc/gn/v20190818/

/p/user_pub/xclim/CMIP6/CMIP/ssp126/atmos/mon/hur/CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.hur.atmos.glb-p19-gn.v20190818.0000000.0.xml /p/css03/scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/hur/gn/v20190818/

/p/user_pub/xclim/CMIP6/CMIP/ssp126/atmos/mon/hus/CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.hus.atmos.glb-p19-gn.v20190818.0000000.0.xml /p/css03/scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/hus/gn/v20190818/

/p/user_pub/xclim/CMIP6/CMIP/ssp126/atmos/mon/clt/CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.clt.atmos.glb-2d-gn.v20190818.0000000.0.xml /p/css03/esgf_publish/CMIP6/CMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/clt/gn/v20190818/

/p/user_pub/xclim/CMIP6/CMIP/ssp126/atmos/mon/huss/CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.huss.atmos.glb-z1-gn.v20190818.0000000.0.xml /p/css03/scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/huss/gn/v20190818/

durack1 commented 4 years ago

@painter1 @sashakames I wonder if the scratch issues above can easily be fixed, by copying (while fixing the incorrect activity_id: CMIP -> ScenarioMIP) to esgf_publish?

The single published (esgf_publish) issue FGOALS-g3/ssp126 should really be fixed, though on a low priority, as I am sure these data are already being downloaded by others

painter1 commented 4 years ago

I would rather see this kind of mistake fixed at the source, i.e. retract and publish correctly. If we were to put the FGOALS data somewhere different then anybody looking at ESGF might see similar files in different places and not know what to make of it.

sashakames commented 4 years ago

Well the problem is that they published this incorrectly by manually putting the ssp's under CMIP and esgmapfile/esgpublish doesn't have the sophisticated hierarchical check. If we correct this on our end, the datasets won't be replicas on ESGF, they will have distinct dataset IDs and that could be confusing or problematic to end-users.

We have the new CMIP Inconsistency Checker (the CIC) and could put these check in to original published ESGF data.

durack1 commented 4 years ago

@painter1 @sashakames ok so this requires an errata raised. I'll put that in the to-do list

pochedls commented 4 years ago

Did the path format stay the same during publication? In the directories below, it looks like the MIP part of the path is different between scratch and publish.

/p/css03/scratch/cmip6/CMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/tas/gn/v20190818/

versus

/p/css03/esgf_publish/CMIP6/ScenarioMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/tas/gn/v20190818/

durack1 commented 4 years ago

@pochedls good catch, either way I have sent an email (below) to raise an errata, if it is indeed an issue this should be logged. I will close this issue now as it is not a problem with software

From: "Durack, Paul J."
Date: Monday, August 3, 2020 at 12:40 PM
To: ljli@mail.iap*, yyq@lasg.iap*, zhanghe@mail.iap*, zhengwp@mail.iap*, bixq@mail.iap*, mhzhang@mail.iap*,
zhoutj@LASG.IAP*
Subject: FGOALS-g3 and FGOALS-f3-L CMIP6/ESGF publication problems

Hello from California.

I have reached out to you all as contacts listed for the CAS contributions to the CMIP6 simulation archive.

We discovered some problems with the publication paths for ScenarioMIP data contributed for the
FGOALS-f3-L and FGOALS-g3 models.

For the experiments ssp126, ssp245, ssp370 and ssp585, some simulation data has been erroneously
published under the “CMIP” activity_id, whereas these experiments belong to the “ScenarioMIP” activity_id.

So for e.g. the publication path

cmip6/CMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/tas/gn/v20190818/

Should be

cmip6/ScenarioMIP/CAS/FGOALS-g3/ssp126/r1i1p1f1/Amon/tas/gn/v20190818/

Could you please raise an errata at https://errata.es-doc.org/static/index.html noting that these path
issues exist, and how you plan to resolve such problems, and unpublish erroneously published data and paths.

Many thanks in advance.

P
pochedls commented 4 years ago

One thing I do not understand is that in @sashakames @painter1's comment and comment: it seems like this path issue was due to the modeling center's publication choices and that this mistake should be propagated to the esgf_publish directories.

The point of my comment is that this does not appear to be the case. For the same version, the scratch and publish paths are different (in the mip part of the path). Are we fixing the paths as they are published or was this a strange series of events that led to this difference in the paths.

durack1 commented 4 years ago

@pochedls I think this was an issue that CAS is aware of, they noted it, caught it and it's now fixed, but in between those times Jeff replicated the data, and the problem. What should have happened, is that the version was incremented when the fix was made, but it looks like it wasn't. If we could get creation dates off the host filesystem it would likely tell us exactly when the fix was implemented but looks like that wasn't the case

sashakames commented 4 years ago

I suppose Jeff corrected our copies. checked and CAS deleted (not retracted) and so there is no record anymore in ESGF.

durack1 commented 4 years ago

@sashakames and @pochedls I did have the same question. Or rather, did the CAS fix also get replicated/duplicated, which means we have data that resides in esgf_publish that has the right activity_id

painter1 commented 4 years ago

In the example of @pochedls , I did not make any corrections. It's all automated. To understand it, just looked at one of the files, tas_Amon_FGOALS-g3_ssp126_r1i1p1f1_gn_202001-202912.nc.

This file was first downloaded was in the 'CMIP' activity, October 21. It never got published because not all the files in its dataset were downloaded. Just one of them is missing; the database lists it as 'retracted' now, so it must have been deleted or retracted before it could be downloaded. Retraction would have made it disappear from ESGF, but I don't have a script to automatically delete files from LLNL, so the file remains at LLNL and in the Synda database.

The second copy was published again with the same version number in the 'ScenarioMIP' activity. Then it was downloaded on October 26. It happens to be the last file needed to complete its dataset, so the dataset was probably moved to esgf_publish/ that night (I haven't checked for that).

In short, the system saw the two copies of the files as completely different files because they were in different places. You would have to compare the checksums, or be an intelligent human, to suspect that they are the same.

Looking through these files in our Synda database, I see that only the dataset and one of its files has been marked as 'retracted'. In the near future I will have to see why; it looks like I have a bug. But even if all the files had all been correctly marked, you could still see them in the file system.

durack1 commented 4 years ago

Thanks @painter1, this is good intel, as the publication version/timestamp is v20190818 whereas this change/correction occurred sometime between the 21st and 26th October 2019, so clearly they haven't followed the protocol as specified.

As an aside their global atts in https://github.com/pochedls/xagg/issues/29#issuecomment-666739607 show "2019-08-21T02:01:46Z ;rewrote data to be consistent with ScenarioMIP for variable mrfso found in table Lmon." ; which jives with this publish/retract/move/publish timeline that you defined.

I believe this is a case closed situation, however we do have one (or was it 5) dataset(s) that are missing still as noted in https://github.com/pochedls/xagg/issues/29#issuecomment-666855060

painter1 commented 4 years ago

I just looked at the five datasets in @pochedls's comment in #29.

For the first one, the problem is that /p/css03/scratch/cmip6/ and CMIP6/ point to the same place; but /p/css03/esgf_publish has only CMIP6/. So the dataset is published, but at /p/css03/esgf_publish/CMIP6/ScenarioMIP/CAS/FGOALS-g3/ssp585/r1i1p1f1/Amon/mc/gn/v20190818/

For the remaining four, the problem is that ICHEC hasn't published the corrected dataset yet. At least, they isn't shown on the ESGF search page (with table 'day' rather than 'Amon', you can find them.)

So I agree that we have a "case closed" situation.

painter1 commented 4 years ago

Oops: my comment on the first of the five datasets is really the same as Steve's comment right after he listed the five problem datasets.

The official name is CMIP6, but I added a link 'cmip6' in scratch/ to save the trouble of hitting the shift key. Maybe that was a mistake.

durack1 commented 4 years ago

The issue is closed, but I did have one point to clarify. @pochedls has noted we are missing 5 "fixed" datasets:

CMIP6.CMIP.ssp585.CAS.FGOALS-g3.r1i1p1f1.mon.mc.atmos.glb-l-gn.v20190818.0100000.0.xml CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.hur.atmos.glb-p19-gn.v20190818.0000000.0.xml CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.hus.atmos.glb-p19-gn.v20190818.0000000.0.xml CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.clt.atmos.glb-2d-gn.v20190818.0000000.0.xml CMIP6.CMIP.ssp126.CAS.FGOALS-g3.r1i1p1f1.mon.huss.atmos.glb-z1-gn.v20190818.0000000.0.xml

It seems that the 4 datasets exist for ssp126 and mc for ssp585 but we don't have these locally

painter1 commented 4 years ago

@durack1 , the "missing" datasets were ScenarioMIP copies of the CMIP datasets you listed. All of them exist locally as part of the CMIP activity, but the correct activity for the ssp* experiments is ScenarioMIP. As I pointed out yesterday, we actually have the ScenarioMIP copy of the ssp585 mc dataset - but the high-level directory name is the usual CMIP6, not cmip6. The other four datasets do not exist on ESGF. ICHEC, or whoever it was that originally published the data as CMIP, never finished making the correction.

durack1 commented 4 years ago

@painter1 sorry if I am missing something, but if you click the links above (which points to ESGF) you'll see that all the datasets exist under the ScenarioMIP activity, as they should, so CAS (or ICHEC) has done the cleanup. It's just our local files don't reflect this, and we don't have the ScenarioMIP directory bound files, only the incorrect path files

painter1 commented 4 years ago

After clicking on that ssp126 link, specify a time frequency of 'mon' and realm of 'atmos'. Then look for the variable names hur, hus, clt, huss. I can't find them. One cause of confusion here may be that a CMIP6 dataset has only one variable, unlike a CMIP5 dataset.

durack1 commented 4 years ago

@painter1 apologies for wasting your time, it seems at least for clt this is daily data only. Case closed.