Error classifying a dataset captured at 60hz

AbdouMechraoui commented 3 years ago

I am trying to classify a dataset where the input is screen positions (in pixels). I keep coming across this error where the classifier throws the following error, for some reason data['vel'] is empty in some of the trials I am trying to classify.

I am not sure what the cause is, in the paper it is mentioned that the algorithm wasn't tested on 50hz sampling rate. The dataset I am using was captured at 60hz, and I had to modify savgol_length to 0.16 to obtain similar window size to the default value.

I would really appreciate it, if someone can give me an insight on how to bypass this problem :D

Traceback (most recent call last):
  File "../classifyData.py", line 101, in <module>classifyData()
  File "classifyData.py", line 59, in classifyData() detectedEvents= classifier(preproce)
  File "..\clf.py", line 345, in __call__
    for e in self._detect_saccades(
  File "..\clf.py", line 462, in _detect_saccades
    event = self._mk_event_record(data, i, "SACC", sacc_start, sacc_end)
  File "..\clf.py", line 328, in _mk_event_record
    self._get_signal_props(data[start:end])))
  File "..\clf.py", line 262, in _get_signal_props
    pv = data['vel'].max()
  File "..\event-classification\lib\site-packages\numpy\core\_methods.py", line 40, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

adswa commented 3 years ago

Hey, hard to say from afar, and without experience with data with this sampling rate. Could you share a data file where this happens, and the exact command line call you run?

AbdouMechraoui commented 3 years ago

input_Data.txt

Hi there! Thank you so much for your reply :) The call I used is clf= EyegazeClassifier( px2deg=px2deg, sampling_rate=sr, pursuit_velthresh=5., noise_factor=3.0, lowpass_cutoff_freq=10.0, ) p= clf.preproc(data, savgol_length=0.16) detectedEvents= clf(p)

I managed to work around this by using the default classifier initialization, namely: clf= EyegazeClassifier( px2deg=px2deg, sampling_rate=sr, pursuit_velthresh=2., noise_factor=5.0, lowpass_cutoff_freq=4.0, )

Now I obtain different detections, is there a way to ascertain how good the result is? Since different initializations result in different detections, I am not sure how one could compare them in the absence of ground truth.

adswa commented 3 years ago

I'm just trying to reproduce the error to see where it comes from and if there is something we can do against it - would you mind telling me whats the value of px2deg so that I can run your call? I have tried with a bunch of guesses, but I don't run into the error with them. It looks like its an edge case that your particular sampling rate and px2deg factor happen to run into.

I managed to work around this by using the default classifier initialization, namely: clf= EyegazeClassifier( px2deg=px2deg, sampling_rate=sr, pursuit_velthresh=2., noise_factor=5.0, lowpass_cutoff_freq=4.0, )

Now I obtain different detections, is there a way to ascertain how good the result is? Since different initializations result in different detections, I am not sure how one could compare them in the absence of ground truth.

I'm glad you found a working parameterization, but its a tough question. Have you tried the visualizations that REMODNAV provides? It will plot the gaze positions and velocity traces together with a color coding of its classifications. This way, you can eyeball whether sudden jumps in coordinates are labelled as saccades, or stationary sequences as fixations and so forth. With my random guess of px2deg, the classification looked like this for me (fixations are greenish, saccades blue, and pursuits pinkish): remodnav

I see that you are working with the python interface directly, you can import the plotting function with

from remodnav.tests.utils import show_gaze
show_gaze(pp=p, events=detectedEvents, sampling_rate=60, coord_lim=(0, 1280),
            vel_lim=(0.001, 1000)))

(adjust the coordinate limits, sampling rates, velocity limits as required)

adswa commented 3 years ago

Now I obtain different detections, is there a way to ascertain how good the result is?

And in addition to eyeballing, you could aggregate summary statistics over the classified events, to e.g., check whether the length of common events matches typical ranges reported in the physiology literature, or check whether you can find the main sequence relationship (Bahill, 1975) in your data.

It also makes sense to think about which types of eye movements your paradigm evokes, e.g., if its only static image viewing, there probably shouldn't be many pursuits, if its a fixation cross that people should foveate constantly, you'd expect mostly fixations, if there is an occasional target you may find correspondence between experiment events and saccades etc

AbdouMechraoui commented 3 years ago

I'm just trying to reproduce the error to see where it comes from and if there is something we can do against it - would you mind telling me whats the value of px2deg so that I can run your call? I have tried with a bunch of guesses, but I don't run into the error with them. It looks like its an edge case that your particular sampling rate and px2deg factor happen to run into.

I just realized the attached file wasn't the correct one, I edited my previous comment now you should be able to reproduce the error. Sorry for not providing you with more details, px2deg=0.025772, I got this value by using the function deg_per_pixel with screen_size= 55,3 viewing_distance=60 screen_resolution=1920 all dimensions are in cm.

Now I obtain different detections, is there a way to ascertain how good the result is?

And in addition to eyeballing, you could aggregate summary statistics over the classified events, to e.g., check whether the length of common events matches typical ranges reported in the physiology literature, or check whether you can find the main sequence relationship (Bahill, 1975) in your data.

It also makes sense to think about which types of eye movements your paradigm evokes, e.g., if its only static image viewing, there probably shouldn't be many pursuits, if its a fixation cross that people should foveate constantly, you'd expect mostly fixations, if there is an occasional target you may find correspondence between experiment events and saccades etc

I will definitely look into this, thank you so much once again :))

adswa commented 3 years ago

I just realized the attached file wasn't the correct one, I edited my previous comment now you should be able to reproduce the error. Sorry for not providing you with more details, px2deg=0.025772, [...]

Strangely, I still can't bring the program to crash, it happily classifies the updated data file with your exact parameters, and also if I use the default parametrization.

# using px2deg of 0.025772, sr=60.0 and you updated input data
>>> clf = EyegazeClassifier(px2deg=px2deg, sampling_rate=sr, pursuit_velthresh=5., noise_factor=3.0, lowpass_cutoff_freq=10.0, )
>>> p = clf.preproc(data, savgol_length=0.16)
>>> detectedEvents = clf(p)
[works fine]
# same px2deg and sr as above
>>> clf = EyegazeClassifier(px2deg=px2deg, sampling_rate=sr)
>>> p = clf.preproc(data, savgol_length=0.16)
>>> detectedEvents = clf(p)
[also works fine]

I'm guessing wildly now, given that numpy raises the final error, maybe we're using different versions of this library? I'm using 1.21.0

AbdouMechraoui commented 3 years ago

I was running numpy==1.21.2, I downgraded to 1.21.0 unfortunately this issue still stands. Since the issue for me starts with clf(p), I am guessing the way the data is being preprocessed on my machine is a bit different, which might result in empty chunks that lead to the reported error.

I have attached p (preproc structured array), would it be possible to feed that to clf and see if the error occurs? preproc_input_data.txt

adswa commented 3 years ago

Great, thanks, I can reproduce it with this. On first quick sight, it looks like it found a velocity peak at the very end of the sample, and the saccade start and end point it generated from this are the exact same sample, probably as the recording just ends afterwards. I assume that we never ran into this because we have only validated the algorithm with 1000Hz data, where such a "collision" would be much more unlikely than with 60Hz data. In any way, with the same start and end point for the saccade, remodnav slices the data into an empty array and tries to find the maximum velocity in this array, and that generates the error you are seeing. That's a bit suboptimal. Its certainly senseless to operate on an empty array, so we should catch this. A saccade like this should also be below the minimum saccade limit, so it would be skipped afterwards and not reported. @mih, do you have a quick thought on this, would simply catching an empty array and a log message be sufficient here?

mih commented 3 years ago

I think this sounds sensible. Thx for digging into the code!

AbdouMechraoui commented 3 years ago

Strangely, I still can't bring the program to crash, it happily classifies the updated data file with your exact parameters, and also if I use the default parametrization.

May I ask for the argument values you used with preproc(), I am facing another issue when doing the preprocessing. I get RuntimeWarning: Mean of empty slice. which traces back to:

Traceback (most recent call last):
  File "../scratch_6.py", line 421, in <module>
    classifyEyeData()
  File "../scratch_6.py", line 286, in classifyEyeData
    detectedEvents= clf(preproce)
  File "..\clf.py", line 341, in __call__
    self.get_adaptive_saccade_velocity_velthresh(data['med_vel'])
  File "..\clf.py", line 307, in get_adaptive_saccade_velocity_velthresh
    cur_thresh, med, scale = _get_thresh(old_thresh)
  File "..\clf.py", line 298, in _get_thresh
    med = np.median(vel_uthr)

I also wanted to ask if there are any preprocessing parameters (e.g. filter_length, or min blink duration) that I should adapt/chance since I am using 60hz dataset. I highly appreciate your insight and help :)

adswa commented 3 years ago

Quick question: Can you change the parameter min_saccade_duration? It presently is the default of 0.01s. This would, at a sampling rate of 60Hz, translate to a minimum duration of a saccade of less than one sample. Could you set it to, lets say, 0.02?

adswa commented 3 years ago

I also wanted to ask if there are any preprocessing parameters (e.g. filter_length, or min blink duration) that I should adapt/chance since I am using 60hz dataset.

We internally transform any parameter with a duration in seconds into samples by multiplying it with the sampling rate. So the default min_blink_duration of 0.02s would allow Remodnav to classify periods of one sample or longer as a blink. At 1000Hz, it would only classify periods of 20 samples or longer as a blink, and discard shorter events as being too brief to be a blink. For any of those parameters you should check if this makes sense to you. And we should do input validation on whether they are high enough to prevent empty slices of data. E.g. a minimum saccade length thats shorter than one sample isn't able to discard any saccade (which I now see is the actual cause for your initial traceback, and likely also your most recent one)

AbdouMechraoui commented 3 years ago

Could you set it to, lets say, 0.02?

I just did that, unfortunately the issue still stand :/

adswa commented 3 years ago

There must be something that you and I do differently here :smile: Just to prevent one most common type of error, can you restart your python session (or delete previous variables, or create new ones with different names) such that there is no variable in memory that could "leak" into your attempts to fix this?

Because I can fix it with a longer minimal saccade duration.

I finally managed to reproduce the error you were seeing. Here, I am executing remodnav from the terminal with the parametrization from your Python snippets (correct me if there is something wrong please) (The WARNING is a local change I made for improved input validation):

$ remodnav input_Data\(1\).txt out.tsv 0.025772 60 \
 --savgol-length=0.16 --noise-factor=3 \
 --lowpass-cutoff-freq=10 --pursuit-velthresh=5 

WARNING: At the provided sampling rate of 60.0, the timeframe for the parameter 'min-saccade-duration' would be lower than a single sampling rate (0.60 samples). Consider increasing the parameter value to prevent errors.
Traceback (most recent call last):
  File "/home/adina/env/remodnav/bin/remodnav", line 11, in <module>
    load_entry_point('remodnav', 'console_scripts', 'remodnav')()
  File "/home/adina/repos/remodnav/remodnav/__init__.py", line 152, in main
    events = clf(pp, classify_isp=True, sort_events=True)
  File "/home/adina/repos/remodnav/remodnav/clf.py", line 362, in __call__
    for e in self._detect_saccades(
  File "/home/adina/repos/remodnav/remodnav/clf.py", line 479, in _detect_saccades
    event = self._mk_event_record(data, i, "SACC", sacc_start, sacc_end)
  File "/home/adina/repos/remodnav/remodnav/clf.py", line 345, in _mk_event_record
    self._get_signal_props(data[start:end])))
  File "/home/adina/repos/remodnav/remodnav/clf.py", line 276, in _get_signal_props
    pv = data['vel'].max()
  File "/home/adina/env/remodnav/lib/python3.9/site-packages/numpy/core/_methods.py", line 40, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

I can fix it by specifying a --min-saccade-duration of 0.02:

$ remodnav input_Data\(1\).txt out.tsv 0.025772 60 \
  --savgol-length=0.16 --noise-factor=3  \
  --lowpass-cutoff-freq=10 --min-saccade-duration=0.02 \
  --pursuit-velthresh=5
[success!]

AbdouMechraoui commented 3 years ago

Because I can fix it with a longer minimal saccade duration.

Thank you so much for your input :DD I confirm this does resolve the initial issue of zero-sized arrays during classification. You were absolutely right!! The second issue/warning was also resolved by changing dilate_nan=0.02 instead of dilate_nan=0.01 (default value).

We internally transform any parameter with a duration in seconds into samples by multiplying it with the sampling rate. So the default min_blink_duration of 0.02s would allow Remodnav to classify periods of one sample or longer as a blink. At 1000Hz, it would only classify periods of 20 samples or longer as a blink, and discard shorter events as being too brief to be a blink. For any of those parameters you should check if this makes sense to you. And we should do input validation on whether they are high enough to prevent empty slices of data. E.g. a minimum saccade length thats shorter than one sample isn't able to discard any saccade (which I now see is the actual cause for your initial traceback, and likely also your most recent one)

This makes complete sense, basically, parameter*samplingRate>=1 otherwise it would result in empty slices.

adswa commented 3 years ago

Glad it worked :-) I'll close this issue automatically upon merging the linked PR.

KevKo1990 commented 3 years ago

Hi there, I am running into similar problems with a 60Hz dataset.

I set the following two parameters to 0.02: --dilate-nan and --min-saccade-duration, however I receive the same error message:

Traceback (most recent call last):
  File ".../lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File ".../lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File ".../lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File ".../lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/local/home/kekoch/python_projects/lab_study/eye_tracking_analysis/label_eye_data/create_eye_tracking_event_data.py", line 293, in create_remodnav_event_files
    remodnav_df = apply_remodnav(tmp_read_filepath, tmp_store_filepath,
  File "/local/home/kekoch/python_projects/lab_study/eye_tracking_analysis/label_eye_data/create_eye_tracking_event_data.py", line 95, in apply_remodnav
    rem.main([sys.argv[0]] + variables_remodnav)
  File ".../lib/python3.8/site-packages/remodnav/__init__.py", line 152, in main
    events = clf(pp, classify_isp=True, sort_events=True)
  File ".../lib/python3.8/site-packages/remodnav/clf.py", line 348, in __call__
    for e in self._detect_saccades(
  File ".../lib/python3.8/site-packages/remodnav/clf.py", line 465, in _detect_saccades
    event = self._mk_event_record(data, i, "SACC", sacc_start, sacc_end)
  File ".../lib/python3.8/site-packages/remodnav/clf.py", line 331, in _mk_event_record
    self._get_signal_props(data[start:end])))
  File ".../lib/python3.8/site-packages/remodnav/clf.py", line 263, in _get_signal_props
    pv = data['vel'].max()
  File ".../lib/python3.8/site-packages/numpy/core/_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

Are there any other parameters that you recommend to change on a 60Hz dataset to prevent this error?

KevKo1990 commented 3 years ago

It would help me as well to understand these two parameters better.

--min-saccade-duration is rather easy to understand - with 0.02s minimum this amount of data is needed to classify a saccade.
--dilate-nan I don't understand completely what these "missing markers" are that are replaced. Is it just a fillforward or -backward to replace nans?

Because when I increase the --dilate-nan parameter to e.g. 0.1s, the algorithm can run - however, I do not really understand the effect that this change causes on the classification performance of the algorithm.

Thank you for the clarification!

Update:

Ok, I found out what --dilate-nan does - it extends the area around a nan for ignoring the values before and after it. Hence, with a larger value, less values before and after nans are classified by remodnav. However, I am still curious why the algorithm runs into the trouble from my prior post.

adswa commented 3 years ago

Hey, thanks for commenting. I realize that we must have forgotten about the PRs that were created in response to this issue. I'm rerunning the CIs on the PRs, and will merge them tomorrow if everything looks alright. @mih if you could make me a project collaborator on pypi (my user name is adina, registered with my usual email) I could make a release tomorrow and upload it. It would be cool if you could try with the new release, @KevKo1990 - would probably not fix your error, but at least give a more informative error.

Are there any other parameters that you recommend to change on a 60Hz dataset to prevent this error?

unfortunately, as I don't have any personal experience with 60Hz data at all, I don't have recommendations to share. But #29 looks like I have included a few better warnings a few months ago...

mih commented 3 years ago

@adswa I invited you to the pypi project. Thx!

adswa commented 3 years ago

This was auto-closed with the merge of #29, but I have just made a release. pip install -U remodnav should get you the lastest version, @KevKo1990.

adswa commented 3 years ago

oh, and of course feel free to reopen this, or open a new issue if the added input validation doesn't help to fix your problem :)

psychoinformatics-de / remodnav

Error classifying a dataset captured at 60hz #28