xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
117 stars 12 forks source link

[Bug]: Reanalysis data open error #451

Closed lee1043 closed 1 year ago

lee1043 commented 1 year ago

What happened?

Reported by @msahn

When I tried to use xcdat.open for MERRA data, I got the below error:

What did you expect to happen?

No response

Minimal Complete Verifiable Example

>>> xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc')

Relevant log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xcdat/dataset.py", line 223, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/backends/api.py", line 1000, in open_mfdataset
    combined = combine_by_coords(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/combine.py", line 982, in combine_by_coords
    concatenated = _combine_single_variable_hypercube(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/combine.py", line 640, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/combine.py", line 239, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/combine.py", line 275, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/combine.py", line 298, in _combine_1d
    combined = concat(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/concat.py", line 243, in concat
    return _dataset_concat(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/concat.py", line 504, in _dataset_concat
    merged_vars, merged_indexes = merge_collected(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/merge.py", line 302, in merge_collected
    merged_vars[name] = unique_variable(
  File "/home/ahn6/anaconda3/envs/pcmdi_metrics_dev/lib/python3.9/site-packages/xarray/core/merge.py", line 156, in unique_variable
    raise MergeError(
xarray.core.merge.MergeError: conflicting values for variable 'lon_bnds' on objects to be combined. You can skip this check by specifying compat='override'.
If I use xarray.open, the error does not occur.

>>> xr.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc')
<xarray.Dataset>
Dimensions:    (lat: 361, lon: 952, time: 13574, bnds: 2)
Coordinates:
  * lat        (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
  * lon        (lon) float64 0.0 0.6667 0.6667 1.333 ... 358.7 358.7 359.3 359.3
  * time       (time) datetime64[ns] 1979-01-01T12:00:00 ... 2016-02-29T12:00:00
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (lon, time, bnds) datetime64[ns] dask.array<chunksize=(952, 365, 2), meta=np.ndarray>
    lat_bnds   (lon, time, lat, bnds) float64 dask.array<chunksize=(952, 365, 361, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(365, 952, 2), meta=np.ndarray>
    pr         (time, lat, lon) float32 dask.array<chunksize=(365, 361, 952), meta=np.ndarray>
Attributes: (12/20)
    institution:     Global Modeling and Assimilation Office, NASA Goddard Sp...
    institute_id:    NASA-GMAO
    experiment_id:   MERRA
    source:          MERRA Monthly 0.25x0.25 degree merged
    model_id:        GEOS-5
    contact:         MERRA, Steven Pawson (steven.pawson-1@nasa.gov)
    ...              ...
    Conventions:     CF-1.4
    project_id:      CREATE-IP
    table_id:        Table day (17 July 2013) 7d9fb6bca86b3be6e80f4fe674d87427
    title:           Reanalysis output prepared for CREATE-IP.
    modeling_realm:  atmos
    cmor_version:    2.9.1

Anything else we need to know?

No response

Environment

xcdat 0.4.0

pochedls commented 1 year ago

Hi @lee1043 – see this FAQ on this issue. Does this resolve the issue?

lee1043 commented 1 year ago

Hi @pochedls, thank you for the information. In this specific case unfortunately either or both of compat and join option was not helping. I guess this might be a data issue, but wanted to share with the team just in case this could be helpful. I haven't looked the data itself in detail.

@msahn can you tell if the data is raw data or obs4mip-processed?

>>> xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', compat='override')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xcdat/dataset.py", line 205, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/backends/api.py", line 1011, in open_mfdataset
    combined = combine_by_coords(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 976, in combine_by_coords
    concatenated = _combine_single_variable_hypercube(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 634, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 235, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 270, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 293, in _combine_1d
    combined = concat(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 249, in concat
    return _dataset_concat(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 505, in _dataset_concat
    concat_over, equals, concat_dim_lengths = _calc_concat_over(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 406, in _calc_concat_over
    process_subset_opt(coords, "coords")
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 321, in process_subset_opt
    raise ValueError(
ValueError: Cannot specify both coords='different' and compat='override'.
>>> xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', join='override')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xcdat/dataset.py", line 205, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/backends/api.py", line 1011, in open_mfdataset
    combined = combine_by_coords(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 976, in combine_by_coords
    concatenated = _combine_single_variable_hypercube(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 634, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 235, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 270, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 293, in _combine_1d
    combined = concat(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 249, in concat
    return _dataset_concat(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 521, in _dataset_concat
    merged_vars, merged_indexes = merge_collected(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/merge.py", line 291, in merge_collected
    merged_vars[name] = unique_variable(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/merge.py", line 145, in unique_variable
    raise MergeError(
xarray.core.merge.MergeError: conflicting values for variable 'lon_bnds' on objects to be combined. You can skip this check by specifying compat='override'.
>>> xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', compat='override', join='override')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xcdat/dataset.py", line 205, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/backends/api.py", line 1011, in open_mfdataset
    combined = combine_by_coords(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 976, in combine_by_coords
    concatenated = _combine_single_variable_hypercube(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 634, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 235, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 270, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/combine.py", line 293, in _combine_1d
    combined = concat(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 249, in concat
    return _dataset_concat(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 505, in _dataset_concat
    concat_over, equals, concat_dim_lengths = _calc_concat_over(
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 406, in _calc_concat_over
    process_subset_opt(coords, "coords")
  File "/home/lee1043/.conda/envs/pcmdi_metrics_dev_20230418/lib/python3.9/site-packages/xarray/core/concat.py", line 321, in process_subset_opt
    raise ValueError(
ValueError: Cannot specify both coords='different' and compat='override'. 
msahn commented 1 year ago

@lee1043 This is raw data downloaded from CREATE-IP. search_url='https://esgf-node.llnl.gov/esg-search/wget/?distrib=false&dataset_id=CREATE-IP.reanalysis.NASA-GMAO.GEOS-5.MERRA.atmos.day.v20200611|esgf.nccs.nasa.gov'

FYI, other reanalysis data downloaded from CREATE-IP (CFSR, ERA-Interim, JRA-55, MERRA2) are working well with xcdat.open. This error occurs only with MERRA among them.

pochedls commented 1 year ago

@lee1043 – this arose again for @bonfils2 in #470. Do the workarounds in that thread work (if so, maybe we can close this thread as a duplicate and continue the dialogue on that issue)?

lee1043 commented 1 year ago

Thank you @pochedls for the workaround. I found both these three commends works.

ds = xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', data_vars="all")
ds = xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', drop_variables="lon_bnds")

Another option suggested by @pochedls yields the same error

ds = xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', coords="all")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xcdat/dataset.py", line 216, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/backends/api.py", line 1011, in open_mfdataset
    combined = combine_by_coords(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/combine.py", line 976, in combine_by_coords
    concatenated = _combine_single_variable_hypercube(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/combine.py", line 634, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/combine.py", line 235, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/combine.py", line 270, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/combine.py", line 293, in _combine_1d
    combined = concat(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/concat.py", line 249, in concat
    return _dataset_concat(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/concat.py", line 521, in _dataset_concat
    merged_vars, merged_indexes = merge_collected(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/merge.py", line 291, in merge_collected
    merged_vars[name] = unique_variable(
  File "/home/lee1043/.conda/envs/xcdat_dev_20230424/lib/python3.10/site-packages/xarray/core/merge.py", line 145, in unique_variable
    raise MergeError(
xarray.core.merge.MergeError: conflicting values for variable 'lon_bnds' on objects to be combined. You can skip this check by specifying compat='override'.

I am closing this issue to continue the dialogue on https://github.com/xCDAT/xcdat/issues/470.

lee1043 commented 1 year ago

Below works -- thanks @pochedls

ds = xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', coords="minimal", compat='override')
tomvothecoder commented 1 year ago

@lee1043 Are these files available on /p/user_pub? I would like to try these options out while I update the docs in #473

tomvothecoder commented 1 year ago

I am going to omit this option from #473 because we don't handle the extra concatenated time dimension in our APIs:

ds = xcdat.open_mfdataset('/work/ahn6/obs/MERRA/pr/pr_day_reanalysis_MERRA_*.nc', data_vars="all")

lee1043 commented 1 year ago

@tomvothecoder thanks for following up. The data is not in /p/user_pub at the moment but I am working on to find a good place and obtain permission to move the data over to it. I will keep you updated when the data is ready in /p/user_pub

tomvothecoder commented 1 year ago

@lee1043 No problem. I found some other datasets in old scripts to reproduce this issue so you don't need to add these files to /p/user_pub anymore!

lee1043 commented 1 year ago

I just added the data to /p/user_pub/PCMDIobs/obs4MIPs_input/create-ip/MERRA2/pr.