serazing / xscale

Xscale a library of multi-dimensional signal processing tools using parallel computing
Apache License 2.0
25 stars 12 forks source link

Multi-dimensional linear trend #1

Open serazing opened 7 years ago

serazing commented 7 years ago

I am looking for a multi-dimensional linear fit method based on dask in order to efficiently remove linear trends before computing fft in xscale.spectral.fft.ftt.

kuchaale commented 7 years ago

@serazing Would this be suitable for your case?

serazing commented 7 years ago

@kuchaale Thanks but your solution only works because time is assigned to the first dimension. If you try to apply this function to the x dimension, it fails. One solution could be to transpose the array before computing the linear trend.

Even if one manages to do that, da.linalg.lststq cannot deal with more than two dimensions. It will fail for three dimensional arrays. Again one solution would be to stack the rightmost dimensions in one big dimension, then do the fit and finally unstack the array.

What I want to do is to make a robust and flexible function that fit a linear trend and optionally compute some confidence intervals relative to this trend. I think there is some serious work to do to achieve that since it is not implemented in numpy functions for ndarrays.

kuchaale commented 7 years ago

@serazing I did not know that da.linalg.lststq cannot deal with more than two dimensions. I ussumed that it could since scipy.signal.detrend can deal with multi-dimensinal array and uses scipy.linalg.lstsq function which I ussumed as an analogue to da.linalg.lststq.

In my X-regression project I used statsmodels project to get more sophisticated regression functions, e.g. confidence intervals or p-values. I achieved desirable results using xarray functions such as stack, groupby and apply. Woud this approach be suitable for you?

serazing commented 6 years ago

I recently added something similar using da.vstack and da.linalg.lststq for sinusoidal and polynomial fitting in xscale.signal.fitting. Feel free to check the code.

ShashankBice commented 6 years ago

Hi @serazing , Great library! I am running into trouble with the linear trend function along time axis and the code is giving an error message "'NoneType' object is not subscriptable"

My xrarray(elevation_xarray):- elevation_xarray

<xarray.DataArray 'elevation' (time: 29, lat: 556, lon: 743)> array([[[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]],

   [[      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan],
    ...,
    [4482.1846, 4487.041 , ...,       nan,       nan],
    [      nan, 4479.762 , ...,       nan,       nan]],

   ...,

   [[      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan],
    ...,
    [      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan]],

   [[      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan],
    ...,
    [      nan,       nan, ..., 6229.428 ,       nan],
    [      nan,       nan, ..., 6229.306 ,       nan]]], dtype=float32)

Coordinates:

and the function I call is:- elevation_trend=xfit.linreg(elevation_xarray,dim='time') Corresponding error message:- TypeError Traceback (most recent call last)

in () ----> 1 elevation_trend=xfit.linreg(elevation_xarray,dim='time') ~/glacierhack_2018/xscale/xscale/signal/fitting.py in linreg(array, dim, coord) 108 A Dataset containing the slope and the offset of the linear regression 109 """ --> 110 linfit = polyfit(array, dim=dim, coord=coord, deg=1) 111 offset = linfit.sel(degree=0, drop=True) 112 slope = linfit.sel(degree=1, drop=True) ~/glacierhack_2018/xscale/xscale/signal/fitting.py in polyfit(array, deg, dim, coord) 42 # + stack the other dimensions 43 array_stacked = _order_and_stack(array, dim) ---> 44 dim_chunk = array.chunks[array.get_axis_num(dim)][0] 45 46 if coord is None: TypeError: 'NoneType' object is not subscriptable
serazing commented 6 years ago

Thanks @ShashankBice for this feedback. Hum, I have got the impression thatdrop=True at lines 111 and 112 is the source of the error because you have a DataArray containing nans. I am not really sure why I decided to use the option drop=True, feel free to modify the code and test it if you have some time to do it. Otherwise, I will try to do ti myself when I get more time.

Cheers

ShashankBice commented 6 years ago

Hi @serazing , I played with the module a bit, I do not get this error if I pass a dask array instead of an xarray. This works even if I have nans with data, only thing is that in case of nans, I get an output full of nans while in case of all valid data, I get a somewhat valid result.

I was just wondering if there is an intelligent way to ignore nans by the fitter function, like suppose if I have 15 values along a dimension and one of them has nan, can I use the other 14 to compute linear regression and ignore the one data with nan ?

Thanks!

serazing commented 6 years ago

Hi @ShashankBice, Sorry for the late response. A fitting function that would deal with nan values would be really nice indeed. However, the current function is based on the least square method da.linalg.lstsq (have a look also inside the function xscale.fitting.polyval), which cannot deal with missing values or NaNs (to be checked). To fix the problem with array data containing numpy array instead of task array, we should force the creation of dask array using the method xarray.DataArray.chunk(chunks=None).