Error raised when array has additional 2D coordinates

santisoler commented 2 years ago

In Harmonica we usually produce 2D grids out of scattered data whose dimensions are northing and easting. In addition, we often have an extra coordinate: the upward, which is the vertical coordinate of each point of the grid. The upward coordinate is included into the list of coordinates inside the array, but as a 2D array with the same dimensions as the grid itself.

When trying to compute the fft over this kind of arrays, xrft is raising and error because it recognizes the upward as a bad coordinate. I leave a minimal working example to reproduce the error:

import numpy as np
import xarray as xr
import xrft

# Build a sample array
# --------------------
# Generate easting and northing
easting = np.linspace(-5e3, 3e3, 91)
northing = np.linspace(4e3, 10e3, 61)

# Generate sample data and the upward coordinate
ee, nn = np.meshgrid(easting, northing)
upward = np.ones_like(ee)
data = ee ** 2 + nn ** 2

# Build the array
dims = ("northing", "easting")
da = xr.DataArray(
    data,
    coords={"easting": easting, "northing": northing, "upward": (dims, upward)},
    dims=dims,
)
print(da)

# Compute FFT
# -----------
xrft.fft(da, true_phase=True, true_amplitude=True)

<xarray.DataArray (northing: 61, easting: 91)>
array([[4.10000000e+07, 4.01190123e+07, 3.92538272e+07, ...,
        2.39649383e+07, 2.44745679e+07, 2.50000000e+07],
       [4.18100000e+07, 4.09290123e+07, 4.00638272e+07, ...,
        2.47749383e+07, 2.52845679e+07, 2.58100000e+07],
       [4.26400000e+07, 4.17590123e+07, 4.08938272e+07, ...,
        2.56049383e+07, 2.61145679e+07, 2.66400000e+07],
       ...,
       [1.21040000e+08, 1.20159012e+08, 1.19293827e+08, ...,
        1.04004938e+08, 1.04514568e+08, 1.05040000e+08],
       [1.23010000e+08, 1.22129012e+08, 1.21263827e+08, ...,
        1.05974938e+08, 1.06484568e+08, 1.07010000e+08],
       [1.25000000e+08, 1.24119012e+08, 1.23253827e+08, ...,
        1.07964938e+08, 1.08474568e+08, 1.09000000e+08]])
Coordinates:
  * easting   (easting) float64 -5e+03 -4.911e+03 -4.822e+03 ... 2.911e+03 3e+03
  * northing  (northing) float64 4e+03 4.1e+03 4.2e+03 ... 9.8e+03 9.9e+03 1e+04
    upward    (northing, easting) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
Traceback (most recent call last):
  File "/home/santi/tmp/test.py", line 27, in <module>
    xrft.fft(da, true_phase=True, true_amplitude=True)
  File "/home/santi/.miniforge3/envs/default/lib/python3.9/site-packages/xrft/xrft.py", line 392, in fft
    raise ValueError(
ValueError: The input array contains coordinate variable(s) (['upward']) whose dims include the transform dimension(s) `northing`. Please drop these coordinates (`.drop(['upward']`) before invoking xrft.

Version of xrft: v0.4.0

I think we could improve the detection of the bad coordinates in a way that it doesn't include any n-dimensional coordinate (where n > 1) so we don't need to drop them before passing them to fft.

What do you think?

santisoler commented 2 years ago

Another idea would be to apply fft over indexes only, instead of looking for coordinates.

roxyboy commented 2 years ago

Another idea would be to apply fft over indexes only, instead of looking for coordinates.

Can you elaborate on this? xrft uses the raw coordinates to inform the coordinates after the fft is taken.

santisoler commented 2 years ago

Can you elaborate on this? xrft uses the raw coordinates to inform the coordinates after the fft is taken.

Sure! Sorry for the brevity, now that I reread it sounds like I was thinking out loud.

One alternative way to select the possible coordinates along which the fft can be performed could be through the indexes. If you see the output of the previous print statement, the indexes are those coordinates that has an * before their name.

For instance, if I have an array with several coordinates:

northing and easting as indexes,
northing_km and easting_km as extra coordinates with northing and easting dims, respectively,
and the same upward coordinate.

import numpy as np
import xarray as xr

# Build a sample array
# --------------------
# Generate easting and northing, and their versions in km
easting = np.linspace(-5e3, 3e3, 91)
northing = np.linspace(4e3, 10e3, 61)
easting_km, northing_km = easting * 1e-3, northing * 1e-3

# Generate sample data and the upward coordinate
ee, nn = np.meshgrid(easting, northing)
upward = np.ones_like(ee)
data = ee ** 2 + nn ** 2

# Build the array
dims = ("northing", "easting")
da = xr.DataArray(
    data,
    coords={
        "easting": easting,
        "northing": northing,
        "northing_km": ("northing", northing_km),
        "easting_km": ("easting", easting_km),
        "upward": (dims, upward),
    },
    dims=dims,
)
print(da)

print("\nIndexes:", tuple(i for i in da.indexes))

<xarray.DataArray (northing: 61, easting: 91)>
array([[4.10000000e+07, 4.01190123e+07, 3.92538272e+07, ...,
        2.39649383e+07, 2.44745679e+07, 2.50000000e+07],
       [4.18100000e+07, 4.09290123e+07, 4.00638272e+07, ...,
        2.47749383e+07, 2.52845679e+07, 2.58100000e+07],
       [4.26400000e+07, 4.17590123e+07, 4.08938272e+07, ...,
        2.56049383e+07, 2.61145679e+07, 2.66400000e+07],
       ...,
       [1.21040000e+08, 1.20159012e+08, 1.19293827e+08, ...,
        1.04004938e+08, 1.04514568e+08, 1.05040000e+08],
       [1.23010000e+08, 1.22129012e+08, 1.21263827e+08, ...,
        1.05974938e+08, 1.06484568e+08, 1.07010000e+08],
       [1.25000000e+08, 1.24119012e+08, 1.23253827e+08, ...,
        1.07964938e+08, 1.08474568e+08, 1.09000000e+08]])
Coordinates:
  * easting      (easting) float64 -5e+03 -4.911e+03 ... 2.911e+03 3e+03
  * northing     (northing) float64 4e+03 4.1e+03 4.2e+03 ... 9.9e+03 1e+04
    northing_km  (northing) float64 4.0 4.1 4.2 4.3 4.4 ... 9.6 9.7 9.8 9.9 10.0
    easting_km   (easting) float64 -5.0 -4.911 -4.822 -4.733 ... 2.822 2.911 3.0
    upward       (northing, easting) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0

Indexes: ('easting', 'northing')

If I run xrft.fft over this array, I would get instructed to drop the upward, the northing_km and the easting_km. If we instead look for the coordinates along which to perform the fft through the indexes, we avoid that and return the FT array with just freq_northing and freq_easting, ignoring the other coords.

This is an unpolished idea, so feel free to disagree and/or offer another way to solve this issue.

santisoler commented 2 years ago

If working with indexes is too complex, maybe we could just ignore these extra coordinates and apply the FFT only on the ones that are named after the dimensions of the array.

I think my main issue is with the error against bad coordinates. Is there any special reason for it to exist?

roxyboy commented 2 years ago

PR #163 was where we implemented the error detection as an easy fix to issue #162 .

roxyboy commented 2 years ago

If working with indexes is too complex, maybe we could just ignore these extra coordinates and apply the FFT only on the ones that are named after the dimensions of the array.

I personally like this idea.

santisoler commented 2 years ago

I've just noticed that pad has a similar problem: it raises a cryptic xarray error when trying to pad a grid with the extra upward coordinate. So probably a single private function should take care of the task of ignoring additional coordinates for both cases, and only a simple call to it inside fft and pad should do the trick.

xgcm / xrft

Error raised when array has additional 2D coordinates #166