xgcm / xrft

Fourier transforms on xarray data structures
https://xrft.readthedocs.io/en/latest/
MIT License
154 stars 45 forks source link

Possible error in handling of coords when using `real_dim` #162

Closed paigem closed 3 years ago

paigem commented 3 years ago

I ran into an error when running xrft functions. The error seems to occur when the xarray DataArray has multiple coordinates that depend on the dimension that is passed to the real_dim flag. Below is (my attempt at) a minimal reproducible example that yields an error included just below.

# Create dummy data array
data = np.arange(-100,100) 

# Add coordinate time and time2, both of which are attached to dimension 'time'
coord = {'time':np.linspace(0,10,len(data)),'time2':('time',np.arange(len(data)))}

# Define the DataArray
da = xr.DataArray(data,dims='time',coords=coord)

# Compute dft
xrft.dft(da,dim='time',real_dim='time')
Value Error ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in ----> 1 xrft.dft(da,dim='time',real_dim='time') /srv/conda/envs/notebook/lib/python3.8/site-packages/xrft/xrft.py in dft(da, spacing_tol, dim, real_dim, shift, detrend, window, true_phase, true_amplitude, chunks_to_segments, prefix, **kwargs) 438 439 newcoords, swap_dims = _new_dims_and_coords(da, dim, k, prefix) --> 440 daft = xr.DataArray( 441 f, dims=da.dims, coords=dict([c for c in da.coords.items() if c[0] not in dim]) 442 ) /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, indexes, fastpath) 408 data = _check_data_shape(data, coords, dims) 409 data = as_compatible_data(data) --> 410 coords, dims = _infer_coords_and_dims(data.shape, coords, dims) 411 variable = Variable(dims, data, attrs, fastpath=True) 412 indexes = dict( /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims) 156 for d, s in zip(v.dims, v.shape): 157 if s != sizes[d]: --> 158 raise ValueError( 159 f"conflicting sizes for dimension {d!r}: " 160 f"length {sizes[d]} on the data but length {s} on " ValueError: conflicting sizes for dimension 'time': length 101 on the data but length 200 on coordinate 'time2' ```

So I think the trouble here is that in taking the FT, the dimension 'time' is converted to 'freq_time', but then xrft and xarray do not know how to handle coordinates that are attached to the 'time' dimension. The above error is easily avoided by removing the coordinate causing trouble (in the above case that would be time2). Perhaps this is the best option? If so, then a more targeted error message might be helpful for users.

I'd be happy to put in a PR for this, but want to get feedback on how to handle this first.

roxyboy commented 3 years ago

I think this is indeed a bug. I didn't have in mind the same dimension having two coordinates corresponding to it when writing the code. A PR would be appreciated :)

roxyboy commented 3 years ago

I'll probably make a new release as soon as this issue is resolved.

paigem commented 3 years ago

Ok sounds good! So just to clarify - what do you think the best course of action is here? Ask the user to delete the extra coordinates, or keep the "problem" coordinate (in my example above, keep 'time' as a dimension in addition to 'freq_time')? Or something else?

roxyboy commented 3 years ago

I doubt that after taking the FFT, any coordinate attached to the dimension upon which it was taken (i.e. time in your example) will be useful to the user but... who knows. So, if you can find a work around to keep the extra metadata, I think that would be the most versatile PR. Otherwise, I think raising an error with a proper message that "the dimension over which the FFT is taken should only have one coordinate" would be sufficient.

paigem commented 3 years ago

Will submit this PR tomorrow - was at a workshop all day and didn't have as much in-between time as I was hoping.

roxyboy commented 3 years ago

Closed via #163