Closed arfriedman closed 4 months ago
Thanks for reporting this. In trying to see if something obvious was wrong, I ended up having a similar (but different) issue.
# grab a random file
wget http://esgf-data.ucar.edu/thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc
# ESGF file
fn = 'pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc'
# or, locally, /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc
ds = xc.open_dataset(fn)
pr_3M = ds.pr.resample(time="3ME").sum()
ValueError: Invalid frequency string provided
pr_3M = ds.resample(time="3M").sum()
pr_3M_avg = pr_3M.spatial.average('pr')
ValueError: More than one grid cell spans prime meridian.
@arfriedman Can you provide us with a public link to the dataset so we can take a look at it? It looks like the precip
variable is either missing the CF "axis" or "standard_name" attributes or they are incorrectly set.
I get an issue with the resample command
# ESGF file fn = 'pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc' # or, locally, /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc ds = xc.open_dataset(fn) pr_3M = ds.pr.resample(time="3ME").sum()
ValueError: Invalid frequency string provided
but the following resample string works (and results in a different error message)
pr_3M = ds.pr.resample(time="3ME").sum() pr_3M_avg = pr_3M.spatial.average('pr')
ValueError: More than one grid cell spans prime meridian.
@pochedls I tried your code but it didn't work for me. The code below works for resampling, but I also get the ValueError
for spatial averaging. I resample using "3M" and not "3ME".
import xcdat as xc
fn = "/p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc"
# Resampling works
ds = xc.open_dataset(fn)
ds = ds.resample(time="3M").sum()
# ValueError: More than one grid cell spans prime meridian.
ds_3m_avg = ds.spatial.average("pr")
Thanks @tomvothecoder – I pasted in the wrong code excerpt (I also used 3M
) – this is fixed now. We're getting the same error.
In looking at this again, I think the longitude bounds get messed up after resample
. They have a different shape (the latitude bounds are modified also).
@arfriedman – Do your bounds also change? Is there a way to use resample while retaining the lon_bnds shape? A potential workaround in this case is to copy the original bounds to the dataset after re-sample.
ds_resampled['lon_bnds'] = ds_original.lon_bnds
ds_resampled['lat_bnds'] = ds_original.lat_bnds
@tomvothecoder / @arfriedman – do you think this is an xarray issue / question?
ds = xc.open_dataset(fn)
print(ds.lon)
<xarray.DataArray 'lon' (lon: 144)>
array([ 0. , 2.5, 5. , 7.5, 10. , 12.5, 15. , 17.5, 20. , 22.5,
- , 27.5, 30. , 32.5, 35. , 37.5, 40. , 42.5, 45. , 47.5,
- , 52.5, 55. , 57.5, 60. , 62.5, 65. , 67.5, 70. , 72.5,
- , 77.5, 80. , 82.5, 85. , 87.5, 90. , 92.5, 95. , 97.5,
- , 102.5, 105. , 107.5, 110. , 112.5, 115. , 117.5, 120. , 122.5,
- , 127.5, 130. , 132.5, 135. , 137.5, 140. , 142.5, 145. , 147.5,
- , 152.5, 155. , 157.5, 160. , 162.5, 165. , 167.5, 170. , 172.5,
- , 177.5, 180. , 182.5, 185. , 187.5, 190. , 192.5, 195. , 197.5,
- , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
- , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
- , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
- , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
- , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
- , 327.5, 330. , 332.5, 335. , 337.5, 340. , 342.5, 345. , 347.5,
- , 352.5, 355. , 357.5]) Coordinates:
- lon (lon) float64 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5 ...
print(ds.lon_bnds.values)
[[ -1.25 1.25] [ 1.25 3.75] ... [356.25 358.75]]
print(ds.lon_bnds.shape)
(144, 2)
ds2 = ds.resample(time="3M").sum()
print(np.allclose(ds.lon, ds2.lon))
True
print(ds2.lon_bnds.values)
[[[ -1.25 1.25]
[ 1.25 3.75] [ 3.75 6.25] ... [351.25 353.75] [353.75 356.25] [356.25 358.75]][[ -1.25 1.25] [ 1.25 3.75] [ 3.75 6.25] ... [351.25 353.75] [353.75 356.25] [356.25 358.75]] ...
print(ds2.lon_bnds.shape)
(201, 144, 2)
Thanks @tomvothecoder and @pochedls.
@tomvothecoder here's a link to the model output -- apologies for the large file size: https://unibe365-my.sharepoint.com/:u:/g/personal/andrew_friedman_unibe_ch/Ec0kAh1-AthCnUVrdH5oGvwBf2zXmzLH0kENqwPL5iqnuA?e=y1fqW4
@pochedls I confirm that spatial averaging works after applying your work-around of copying the original dataset lat and lon bounds. To me this does suggest that the issue is from xarray.
@pochedls @arfriedman
xarray.Dataset.resample()
adds the time dimension to variables that did not originally have a time dimension. There is an Xarray issue for this topic that has been open since 2018. shoyer's comment explains this undesired behavior and possible fixes that need to be implemented in the Xarray API.
If anybody is interested, reviving the thread sounds like a good idea since the last comment was from early 2022.
Before resampling -- no time dimension on "lat_bnds" and "lon_bnds"
ds = xc.open_dataset(fn)
print(ds.data_vars)
"""
Data variables:
pr (time, lat, lon) float32 ...
time_bnds (time, nbnd) object ...
lat_bnds (lat, nbnd) float64 ...
lon_bnds (lon, nbnd) float64 ...
"""
After resampling -- time dimension added to "lat_bnds" and "lon_bnds" -> breaks spatial averaging
ds_3m = ds.resample(time="3M").sum()
"""
Data variables:
pr (time, lat, lon) float32 2.919e-06 2.918e-06 ... 4.918e-06
lat_bnds (time, lat, nbnd) float64 -90.95 -89.05 -89.05 ... 89.05 90.95
lon_bnds (time, lon, nbnd) float64 -1.25 1.25 1.25 ... 356.2 356.2 358.8
"""
Use bounds from the original dataset
ds_3m["lat_bnds"] = ds.lat_bnds.copy()
ds_3m["lon_bnds"] = ds.lon_bnds.copy()
ds_3m_avg = ds_3m.spatial.average("pr")
Closing this issue since it is rooted in Xarray
What happened?
I am encountering an error when taking an xcdat spatial average following xarray temporal resampling.
I am working with monthly model precipitation output named pr_mon:
Taking a spatial average of this monthly variable works fine:
In [8]: pr_mon_avg = pr_mon.spatial.average("precip")
However, after I temporally resample the data using xarray, xcdat spatial averaging returns an error. Here is an example for a three-month temporal sum:
Do you know what might account for this error?
What did you expect to happen? Are there are possible answers you came across?
No response
Minimal Complete Verifiable Example (MVCE)
No response
Relevant log output
No response
Anything else we need to know?
No response
Environment
xCDAT version 0.6.1
In [13]: xr.show_versions() /home/andrew/miniconda3/envs/Nile-Laki/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit: None python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-86-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2
xarray: 2023.12.0 pandas: 2.1.4 numpy: 1.26.2 scipy: 1.11.4 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: 2023.12.1 distributed: 2023.12.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2