SVD did not converge in Linear Least Squares for 60Hz data with a lot of nan

Hello, I hope this message finds you well. I recently attempted to apply your methodology to our 60Hz dataset. Our dataset, which incorporates data from children, unfortunately contains a significant amount of NaN values.

During implementation, I encountered the following error: "LinAlgError: SVD did not converge in Linear Least Squares." Though I noticed that there are provisions for processing NaN values, I still wonder if these NaN values might be the root cause of the convergence issue.

To address this, I have ensured that the valid data points exceed a threshold of 5, but it didn't work. Additionally, I have attached the data file for your reference.

Any insights or suggestions you could provide to overcome this challenge would be greatly appreciated. Thank you for your attention to this matter.

test.csv


`{
    "name": "LinAlgError",
    "message": "SVD did not converge in Linear Least Squares",
    "stack": "---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
d:\\dst\\Repos\\ASD-EyeTrack-RGB\\stats\\Untitled.ipynb Cell 8 line 2
     <a href='vscode-notebook-cell:/d%3A/dst/Repos/ASD-EyeTrack-RGB/stats/Untitled.ipynb#X10sZmlsZQ%3D%3D?line=24'>25</a> filtered_move = pd.DataFrame(eye_move, columns=['x', 'y', 't'])
     <a href='vscode-notebook-cell:/d%3A/dst/Repos/ASD-EyeTrack-RGB/stats/Untitled.ipynb#X10sZmlsZQ%3D%3D?line=25'>26</a> filtered_move.to_csv('test.csv')
---> <a href='vscode-notebook-cell:/d%3A/dst/Repos/ASD-EyeTrack-RGB/stats/Untitled.ipynb#X10sZmlsZQ%3D%3D?line=26'>27</a> eye_clf.preproc(filtered_move, savgol_length=0.05)

File d:\\ProgramData\\Anaconda3\\envs\\torch\\lib\\site-packages\\remodnav\\clf.py:863, in EyegazeClassifier.preproc(self, data, min_blink_duration, dilate_nan, median_filter_length, savgol_length, savgol_polyord, max_vel)
    859     lgr.info(
    860         'Smooth coordinates with Savitzy-Golay filter (len=%i, ord=%i)',
    861         savgol_length, savgol_polyord)
    862     for i in ('x', 'y'):
--> 863         data[i] = savgol_filter(data[i], savgol_length, savgol_polyord)
    865 # velocity calculation, exclude velocities over `max_vel`
    866 # no entry for first datapoint!
    867 velocities = self._get_velocities(data)

File ~\\AppData\\Roaming\\Python\\Python38\\site-packages\\scipy\\signal\\_savitzky_golay.py:351, in savgol_filter(x, window_length, polyorder, deriv, delta, axis, mode, cval)
    347     # Do not pad. Instead, for the elements within `window_length // 2`
    348     # of the ends of the sequence, use the polynomial that is fitted to
    349     # the last `window_length` elements.
    350     y = convolve1d(x, coeffs, axis=axis, mode=\"constant\")
--> 351     _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y)
    352 else:
    353     # Any mode other than 'interp' is passed on to ndimage.convolve1d.
    354     y = convolve1d(x, coeffs, axis=axis, mode=mode, cval=cval)

File ~\\AppData\\Roaming\\Python\\Python38\\site-packages\\scipy\\signal\\_savitzky_golay.py:223, in _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y)
    216 \"\"\"
    217 Use polynomial interpolation of x at the low and high ends of the axis
    218 to fill in the halflen values in y.
    219 
    220 This function just calls _fit_edge twice, once for each end of the axis.
    221 \"\"\"
    222 halflen = window_length // 2
--> 223 _fit_edge(x, 0, window_length, 0, halflen, axis,
    224           polyorder, deriv, delta, y)
    225 n = x.shape[axis]
    226 _fit_edge(x, n - window_length, n, n - halflen, n, axis,
    227           polyorder, deriv, delta, y)

File ~\\AppData\\Roaming\\Python\\Python38\\site-packages\\scipy\\signal\\_savitzky_golay.py:193, in _fit_edge(x, window_start, window_stop, interp_start, interp_stop, axis, polyorder, deriv, delta, y)
    189 xx_edge = xx_edge.reshape(xx_edge.shape[0], -1)
    191 # Fit the edges.  poly_coeffs has shape (polyorder + 1, -1),
    192 # where '-1' is the same as in xx_edge.
--> 193 poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start),
    194                          xx_edge, polyorder)
    196 if deriv > 0:
    197     poly_coeffs = _polyder(poly_coeffs, deriv)

File <__array_function__ internals>:180, in polyfit(*args, **kwargs)

File d:\\ProgramData\\Anaconda3\\envs\\torch\\lib\\site-packages\
umpy\\lib\\polynomial.py:668, in polyfit(x, y, deg, rcond, full, w, cov)
    666 scale = NX.sqrt((lhs*lhs).sum(axis=0))
    667 lhs /= scale
--> 668 c, resids, rank, s = lstsq(lhs, rhs, rcond)
    669 c = (c.T/scale).T  # broadcast scale coefficients
    671 # warn on rank reduction, which indicates an ill conditioned matrix

File <__array_function__ internals>:180, in lstsq(*args, **kwargs)

File d:\\ProgramData\\Anaconda3\\envs\\torch\\lib\\site-packages\
umpy\\linalg\\linalg.py:2300, in lstsq(a, b, rcond)
   2297 if n_rhs == 0:
   2298     # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
   2299     b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
-> 2300 x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
   2301 if m == 0:
   2302     x[...] = 0

File d:\\ProgramData\\Anaconda3\\envs\\torch\\lib\\site-packages\
umpy\\linalg\\linalg.py:101, in _raise_linalgerror_lstsq(err, flag)
    100 def _raise_linalgerror_lstsq(err, flag):
--> 101     raise LinAlgError(\"SVD did not converge in Linear Least Squares\")

LinAlgError: SVD did not converge in Linear Least Squares"
}

Here are the parameters for remodnav, we used a 10.5-inch screen (16:9 with a resolution of 2560x1600), and the viewing distance is approximately 50 cm.


filtered_move = pd.read_csv('test.csv')
eye_clf = remodnav.clf.EyegazeClassifier(remodnav.clf.deg_per_pixel(0.2262,0.5,2560), 60, min_saccade_duration=0.1667)
eye_clf.preproc(filtered_move, savgol_length=0.05)

Hey, thanks for the detailed issue and the data! I tried reproducing the error you are seeing, but things seem to work for me. Here is what I did:

As I like running things in the terminal, just as a personal preference, I saved your test.csv into the right format as specified in the README (tab separated file with two columns (x and y, no header), one row per time point):

In [4]: filtered_move = pd.read_csv('test(1).csv')

In [5]: filtered_move
Out[5]: 
     Unnamed: 0           x           y      t
0             0         NaN         NaN  20912
1             1         NaN         NaN  20929
2             2         NaN         NaN  20945
3             3         NaN         NaN  20962
4             4         NaN         NaN  20979
..          ...         ...         ...    ...
567         567  631.547639  873.589568  30362
568         568  637.170350  853.189800  30379
569         569  640.885495  839.832655  30395
570         570  644.620697  827.325742  30412
571         571  648.052845  817.226048  30429

[572 rows x 4 columns]

In [6]: filtered_move[['x', 'y']]
Out[6]: 
              x           y
0           NaN         NaN
1           NaN         NaN
2           NaN         NaN
3           NaN         NaN
4           NaN         NaN
..          ...         ...
567  631.547639  873.589568
568  637.170350  853.189800
569  640.885495  839.832655
570  644.620697  827.325742
571  648.052845  817.226048

[572 rows x 2 columns]

In [7]: filtered_move[['x', 'y']].to_csv('tabsep_test.tsv', sep='\t', index=False, header=False)

I computed the px2deg factor like you did

In [2]: remodnav.clf.deg_per_pixel(0.2262,0.5,2560)
Out[2]: 0.009957662958692767

I used your parameters and ran everything

❱ remodnav tabsep_test.tsv out.csv 0.009957662 60 --savgol-length=0.05 --min-saccade-duration=0.1667
/home/adina/env/remodnav/lib/python3.11/site-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/adina/env/remodnav/lib/python3.11/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
(remodnav) adina@muninn in ~/Downloads
❱ head out.csv
onset   duration    label   start_x start_y end_x   end_y   amp peak_vel    med_vel avg_vel
1.733   0.050   FIXA    862.75524   1374.90469  724.57951   1378.51875  0.626   44.998  41.291  41.291
2.433   0.117   FIXA    1088.03512  1345.97825  918.95416   1463.89000  1.348   48.359  31.438  30.602
2.667   0.317   FIXA    605.47506   1359.72207  371.88196   1578.27927  2.273   164.438 50.246  62.679
3.550   0.050   FIXA    986.49773   1625.86809  1079.21629  1626.67892  0.466   27.943  27.699  27.699
5.417   0.050   FIXA    305.54610   1515.10409  140.12559   1522.52111  1.116   66.967  49.489  49.489
5.817   0.050   FIXA    230.89659   1409.34927  246.69741   1395.34746  0.094   9.192   7.412   7.412
6.167   0.050   FIXA    514.76449   1566.14658  520.02841   1568.48810  0.004   3.360   1.788   1.788
6.217   0.217   PURS    517.90097   1569.62444  556.91378   1599.53003  0.489   8.864   2.816   3.473
7.667   0.050   FIXA    951.38354   1235.78330  864.18940   1195.03860  0.494   29.622  28.751  28.751

Could you share the software versions of numpy, scipy, and statsmodels you have installed?

Software versions I have:

``` ❱ pip freeze annexremote==1.6.0 asttokens==2.4.1 bleach==6.0.0 boto==2.49.0 certifi==2022.12.7 cffi==1.15.1 chardet==5.1.0 charset-normalizer==3.1.0 contourpy==1.0.7 coverage==7.2.4 cryptography==40.0.2 cycler==0.11.0 datalad==0.18.3 decorator==5.1.1 distro==1.8.0 docutils==0.19 executing==2.0.1 fasteners==0.18 fonttools==4.39.3 humanize==4.6.0 idna==3.4 importlib-metadata==6.6.0 iniconfig==2.0.0 ipython==8.22.1 iso8601==1.1.0 jaraco.classes==3.2.3 jedi==0.19.1 jeepney==0.8.0 keyring==23.13.1 keyrings.alt==4.2.0 kiwisolver==1.4.4 looseversion==1.1.2 markdown-it-py==2.2.0 matplotlib==3.7.1 matplotlib-inline==0.1.6 mdurl==0.1.2 more-itertools==9.1.0 msgpack==1.0.5 numpy==1.24.3 packaging==23.1 pandas==2.0.1 pandoc==2.3 parso==0.8.3 patool==1.12 patsy==0.5.3 pexpect==4.9.0 Pillow==9.5.0 pkginfo==1.9.6 platformdirs==3.5.0 pluggy==1.0.0 plumbum==1.8.1 ply==3.11 prompt-toolkit==3.0.43 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.15.1 pyparsing==3.0.9 pytest==7.3.1 pytest-cov==2.5.1 python-dateutil==2.8.2 python-gitlab==3.14.0 pytz==2023.3 readme-renderer==37.3 remodnav==1.1.2 requests==2.29.0 requests-toolbelt==0.10.1 rfc3986==2.0.0 rich==13.3.5 scipy==1.10.1 SecretStorage==3.3.3 six==1.16.0 stack-data==0.6.3 statsmodels==0.13.5 tqdm==4.65.0 traitlets==5.14.1 twine==4.0.2 tzdata==2023.3 urllib3==1.26.15 wcwidth==0.2.13 webencodings==0.5.1 zipp==3.15.0 ```

Thank you very much for your detailed reply. However, it still didn't work although I followed all the steps. Here is the version of my packages: numpy==1.23.5, scipy==1.9.1, statsmodels==0.14.0. I'll create a new environment with your software versions and try again.

Fortunately, I get the same output after upgrading these three packages. But I get the same warning just as yours, does it matter?

Glad that it works! The warning per se isn't necessarily a bad sign, its likely an internal attempt to divide by zero or nan (which should be handled in the code). But now that you have results, you should closely investigate whether they look plausible given your data and paradigm. You could, e.g., plot the results with the show_gaze function (see https://github.com/psychoinformatics-de/remodnav/issues/28#issuecomment-913737832 for an example). Keep in mind that this algorithm was not validated for data with low sampling rates, and we as authors have no experience with such sampling rates ourselves. :) Good luck!

I'm closing this issue as resolved, but please feel free to reopen it or open a new one if you disagree. :)

psychoinformatics-de / remodnav

SVD did not converge in Linear Least Squares for 60Hz data with a lot of nan #52