mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.61k stars 1.3k forks source link

Researcher file breaks read_raw_eyelink #12690

Open scott-huberty opened 3 days ago

scott-huberty commented 3 days ago

Description of the problem

A researcher has reported that their file breaks read_raw_eyelink. I'm reporting the details below, so apologies in advance for the lengthy explanation:

Problem

In the specification for their ASCII format, EyeLink states that recording periods (where gaze/pupil data are actually being recorded) should always be demarcated by “START” and “END” lines, as shown below:

START
xpos ypos pupil
xpos ypos pupil
END
...
START
xpos ypos pupil
xpos ypos pupil
END
...
START
xpos ypos pupil
xpos ypos pupil
END

For us, this is important because lines in the ASCII file that occur outside these START…END recording sections are unstructured and, IMO, are difficult to parse.

In other words, read_raw_eyelink looks for the START events, and parses lines until it hits an END event. Per Eyelink's specification, we always assume that any given START event will eventually be followed by an END event (before another START event occurs).

This assumption has held up until now. In the problematic file that the researcher shared, it looks like one of the recording blocks in the file is missing an “END” event, resulting in a format like:

START
xpos ypos pupil
xpos ypos pupil
END
...
START
xpos ypos pupil
xpos ypos pupil
...
START
xpos ypos pupil
xpos ypos pupil
END

So what happens is that for the block that is missing an END event, read_raw_eyelink tries to parse lines would typically occur outside recording blocks (specifically, these lines contain information about an eyetracking calibration), and thus that it is not prepared for. This breaks the reader.

I'm not sure how easy it will be to make our reader robust to this case. I might try some other EyeLink ASCII readers out there to see if they are able to read the file. In the mean time I'm opening this ticket so that we have a record of it.

Steps to reproduce

# Get the link to the file from the MNE forum (linked below)

from pathlib import Path
import mne

fname = Path().home() / "path" / "to" / "downloaded" / "file"

raw = mne.io.read_raw_eyelink(fname)

Link to data

https://drive.google.com/drive/folders/15SpQuoXZlmH6ZBLcOEoc4nzEA7ewoHuK

Expected results

a raw object

Actual results

File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/eyelink.py", line 62, in read_raw_eyelink
    raw_eyelink = RawEyelink(
  File "<decorator-gen-202>", line 12, in __init__
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/eyelink.py", line 107, in __init__
    eye_ch_data, info, raw_extras = _parse_eyelink_ascii(
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/_utils.py", line 71, in _parse_eyelink_ascii
    raw_extras["dfs"]["samples"] = _adjust_times(
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/_utils.py", line 509, in _adjust_times
    return pd.merge_asof(
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 708, in merge_asof
    return op.get_result()
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1926, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1151, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 2239, in _get_join_indexers
    right_values = self._convert_values_for_libjoin(right_values, "right")
  File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 2182, in _convert_values_for_libjoin
    raise ValueError(f"{side} keys must be sorted")
ValueError: right keys must be sorted

Additional information

https://mne.discourse.group/t/mne-io-read-raw-eyelink-failure-adjust-times-sub-function-does-not-work-cant-merge/9012

larsoner commented 2 days ago

Naively I would expect our reader to proceed line by line looking for START and parse until it hits a START or END block (formerly just END but we can assume if there's a START, it's like hitting an END and another START), in other words, it seems like it should be fairly easy to handle this case?

scott-huberty commented 11 hours ago

I don't think that would quite work because our reader can't currently parse the lines that are written between an END block and the next START block. So if there is no END block, our reader will try to parse these lines and error out (as in the case of the aforementioned researcher).

Usually the lines that are written in these non-recording blocks contain system information and/or information about a Calibration sequence (EyeLink will always stop recording gaze/pupil samples during a calibration sequence.. So if a user kicks out of an experiment to re-calibrate, an END block should occur followed by information about the Calibration).

Calibration blocks in ASCII files look like this:

>>>>>>> CALIBRATION (HV5,P-CR) FOR LEFT: <<<<<<<<<
MSG 7446696 !CAL Calibration points:  
MSG 7446696 !CAL -46.3, -67.7        -0,    400   
MSG 7446696 !CAL -48.8, -97.0        -0,  -2854   
MSG 7446696 !CAL -44.1, -38.3        -0,   3436   
MSG 7446696 !CAL -111.0, -64.9     -5990,    400   
MSG 7446696 !CAL  14.6, -61.2      5990,    400   
MSG 7446696 !CAL eye check box: (L,R,T,B)
     -124    27  -103   -32
MSG 7446696 !CAL href cal range: (L,R,T,B)
    -8985  8985 -4427  5009
MSG 7446696 !CAL Cal coeff:(X=a+bx+cy+dxx+eyy,Y=f+gx+goaly+ixx+jyy)
     -0  95.801 -7.7092  0.054973  0.022975 
   400.06 -3.604  107.47 -0.12785 -0.13718

In the case of our user, an END block is missing right before they initiated a calibration. Knowing that calibrations occur outside of recording blocks, one idea is too adjust this if-condition to check if the line is the start of a calibration block. Something like:

if tokens[0] == "END" or tokens[1] == "CALIBRATION":  # end of recording block
    is_recording_block = False

Which should solve the problem for the researcher, at least.