Closed yantosca closed 5 years ago
Is anchor
equal in the two files or does it need to be concatenated?
They appear to be the same:
import xarray as xr
ds1 = xr.open_dataset('GCHP.SpeciesConc.20160716_1200z.nc4')
ds2 = xr.open_dataset('GCHP.AerosolMass.20160716_1200z.nc4')
print(ds1['anchor'].values - ds2['anchor'].values)
Which gives:
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]]
I am not 100% sure what this "anchor" variable represents. It was added when NASA updated MAPL to v1.0.0. It seems to be something to do with the cubed-sphere coordinates. But for the purposes of plotting and analyzing the data we don't need it.
If it helps...I also get this error if I just try to subtract the DataArrays from each other directly, instead of the numpy ndarray values:
import xarray as xr
ds1 = xr.open_dataset('GCHP.SpeciesConc.20160716_1200z.nc4')
ds2 = xr.open_dataset('GCHP.AerosolMass.20160716_1200z.nc4')
dr = ds1['anchor'] - ds2['anchor']
print(dr)
Which gives;
Traceback (most recent call last):
File "./run_1mo_benchmark.py", line 481, in <module>
dr = ds1['anchor'] - ds2['anchor']
File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo /lib/python3.6/site-packages/xarray/core/dataarray.py", line 2009, in func
if not reflexive
File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/variable.py", line 1767, in func
self_data, other_data, dims = _broadcast_compat_data(self, other)
File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/variable.py", line 2043, in _broadcast_compat_data
new_self, new_other = _broadcast_compat_variables(self, other)
File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/variable.py", line 2018, in _broadcast_compat_variables
dims = tuple(_unified_dims(variables))
File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/variable.py", line 2001, in _unified_dims
'dimensions: %r' % list(var_dims))
ValueError: broadcasting cannot handle duplicate dimensions: ['nf', 'ncontact', 'ncontact']
If two arrays are the same like this, is there a way to tell manually open_mfdataset not to broadcast them but to use the same values?
It looks like anchor has a repeated dimension name. This is not well supported in xarray. See https://github.com/pydata/xarray/issues/1378. If you don't need it, then I think it's best to drop it.
Thanks again. I will implement a workaround to drop it (probably a wrapper function that calls open_mfdataset). Good to know.
You should be able to do what with the drop_variables
kwarg to open_mfdataset. If that doesn't work, the preprocess
kwarg is the Swiss Army Knife here that lets you pass in a custom function that will be applied before the datasets are combined.
You can use the drop_variables
kwarg. This is passed down to open_dataset
. For more general manipulation, you can use the preprocess
argument.
@jhamman Hahaha.
Thanks, I'll check it out. Wasn't aware of drop_variables.
Closing as a duplicate of #1378
MCVE Code Sample
First download these files:
Then run this code:
Expected Output
This should load data from both files into a single xarray Dataset object and print its contents.
Problem Description
Instead, this error occurs;
It seems to get hung up on trying to merge the "anchor" variable. As a workaround, if I drop the "anchor" variable from both datasets and then use xr.open_mfdataset, then the merge works properly.
Output of
xr.show_versions()