mne-tools / mne-lsl

A framework for real-time brain signal streaming with MNE-Python.
https://mne.tools/mne-lsl
BSD 3-Clause "New" or "Revised" License
55 stars 26 forks source link

PlayerLSL bug ("Bad indexing") with preload=False and n_repeat > 1 #329

Closed toni-neurosc closed 15 hours ago

toni-neurosc commented 1 week ago

When loading a dataset from a file, and using it to start a PlayerLSL instance, there is a bug in array slicing when the following consitions are met:

The issue is caused because the function that handles raw array slicing for memory mapped datasets (BaseRaw._read_segment) does not support negative index slicing, entering into this exception raise in line 467 of base.py from the mne package:

            if start_file < self._first_samps[fi] or stop_file < start_file:
                raise ValueError("Bad array indexing, could be a bug")

The negative index originates in the player_lsl.py file line 254:

                start -= self._raw.times.size  # variable needed in annotations

I see 2 possible fixes to this:

Steps to reproduce:

  1. Generate a file with 1 channel, 1001 datapoints:
    
    import numpy as np
    from mne import Annotations, create_info
    from mne.io import RawArray

annotations = Annotations( onset=[1, 2, 3], duration=[0.1, 0.2, 0.3], description=["event1", "event2", "event3"], )

data = np.zeros((1, 1001)) # 1 channel, 1000 samples data[0, 100:200] = 1 data[0, 500:700] = 2 info = create_info(["signal"], 1000, "misc") raw = RawArray(data, info) annotations = Annotations(onset=[0.1, 0.5], duration=[0.1, 0.2], description=["1", "2"]) raw.set_annotations(annotations)

raw.save("raw.fif")


2. Start a PlayerLSL with looping:
```Python
from mne_lsl.player import PlayerLSL
import time
from mne.io import read_raw

raw = read_raw("raw.fif", preload=False)

player = PlayerLSL(
    raw,
    chunk_size=10,
    name="tutorial-annots",
    annotations=True,
    n_repeat=10,
).start()

time.sleep(10000)
  1. Wait until stream reaches the end of the dataset for the first time and errors without looping

CC: @timonmerk

mscheltienne commented 1 week ago

I never tried with non preloaded dataset, good point! The modulo approach sounds good to me, would you be up for a PR adding also a test with a non-preloaded dataset?

mscheltienne commented 1 week ago

And thanks a lot for the debugging investigation, the bug report was super clear!

toni-neurosc commented 1 week ago

Hi @mscheltienne , thanks for your prompt answer. The debugging was quite straightforward thankfully, just following the source of the error through the class hierarchy until I hit the exception.

We found the issue because in PyNeuromodulation we are using the mne-tools/mne-bids package to read files with the read_raw_bids function, which calls read_raw with default parameter preload=False. I realized eventually that I can pass an extra_params dictionary to enable preload but I wanted to attempt a fix anyway.

I'm up to open the PR, it's just a single character change so it's easy enough, but for the test I need some help. What's your testing worklflow? I found the file mne_lsl/test_mne.py, should I add the test there? And what should it consist of, should I generate a local dataset like in my example and check that it loops properly in a variety of conditions? I think I could manage that.

mscheltienne commented 1 week ago

The test use pytest and all the files starting with test_ in folder(s) tests. Each module has its own tests folder, for instance:

This last one is where the added test should live as the modification only impacts the player module. Usually, you can find one test file test_xxx.py per file in the module, xxx.py, but this is not mandatory. In this case, you can append your test to test_player_lsl.py.

This testing structure is common to many scientific python packages, including the main mne repository ;)


Next is how to test. There are some difficulties with LSL-based test because the tests must pass locally and on the limited github runner which runs the tests online. Have a look at the existing tests you will notice that:


Besides those 2 points, you have a regular and standard pytest test implementation in this package. My approach would be to use the existing fname fixture defined here: https://github.com/mne-tools/mne-lsl/blob/0b2c030d58f5dd18444e908a6d6821b595027a75/mne_lsl/conftest.py#L118-L127

It creates and saves to disk a short raw file where the first channel is populated with the index number. You could do 2 tests:

def test_xxx(fname, chunk_size, request):
    """Test xxx."""
    name = f"P_{request.node.name}"
    source_id = uuid.uuid4().hex
    streams = resolve_streams(timeout=0.1)
    assert (name, source_id) not in [
        (stream.name, stream.source_id) for stream in streams
    ]
    raw = read_raw_fif(fname, preload=False)
    with Player(raw , chunk_size=chunk_size, name=name, source_id=source_id):
        streams = resolve_streams(timeout=2)
        assert (name, source_id) in [
            (stream.name, stream.source_id) for stream in streams
        ]
        time.sleep(raw.times[-1] + 0.5)  # probably needs something else to get the duration of the 'raw'
        streams = resolve_streams(timeout=2)
        assert (name, source_id) in [
            (stream.name, stream.source_id) for stream in streams
        ]
    streams = resolve_streams(timeout=0.1)
    assert (name, source_id) not in [
        (stream.name, stream.source_id) for stream in streams
    ]
def test_xxx(fname, chunk_size, request, close_io):
    """Test xxx."""
    # similar approach but open an inlet and query it instead of `time.sleep`
    # then before exiting the `with Player(...)` statement, don't forget to close the inlet
    with Player(...):
        inlet = ...
        close_io()

Both of those approach would fail on main since the test would raise as you observed; while they should pass on your PR. Note also the existence of from mne_lsl.utils._tests import match_stream_and_raw_data, this function might be useful if you go towards the second option ;)

toni-neurosc commented 6 days ago

Thanks a lot for all the details about the testing pipeline @mscheltienne , your write-up was very useful in writing the test. I pushed the changes, I'm not convinced about the need to use the inlet, it does not really give me much except timestamps and the raw data which the test is not using currently, but perhaps could be used. Right now is doing the function of a sleep(0.01) kinda.