PlayerLSL bug ("Bad indexing") with preload=False and n_repeat > 1

When loading a dataset from a file, and using it to start a PlayerLSL instance, there is a bug in array slicing when the following consitions are met:

using the parameter n_repeat with value higher than 1
using a chunk_size that is not a divisor of the total length of the dataset
using a memory mapped dataset with preload=False in the loading function (e.g. read_raw)

The issue is caused because the function that handles raw array slicing for memory mapped datasets (BaseRaw._read_segment) does not support negative index slicing, entering into this exception raise in line 467 of base.py from the mne package:

            if start_file < self._first_samps[fi] or stop_file < start_file:
                raise ValueError("Bad array indexing, could be a bug")

The negative index originates in the player_lsl.py file line 254:

                start -= self._raw.times.size  # variable needed in annotations

I see 2 possible fixes to this:

The easy one is just to use modulo instead of subtraction to loop around, avoiding negative indexes:
```
            start %= self._raw.times.size  # variable needed in annotations
```
A more comprehensive solution would be to modify BaseRaw._read_segment (in the mne repo) to support negative indexes, but this function is doing quite a bit more than regular slicing and I'm not entirely sure I can fix it without breaking something unintentionally.

Steps to reproduce:

Generate a file with 1 channel, 1001 datapoints:


import numpy as np
from mne import Annotations, create_info
from mne.io import RawArray

annotations = Annotations( onset=[1, 2, 3], duration=[0.1, 0.2, 0.3], description=["event1", "event2", "event3"], )

data = np.zeros((1, 1001)) # 1 channel, 1000 samples data[0, 100:200] = 1 data[0, 500:700] = 2 info = create_info(["signal"], 1000, "misc") raw = RawArray(data, info) annotations = Annotations(onset=[0.1, 0.5], duration=[0.1, 0.2], description=["1", "2"]) raw.set_annotations(annotations)

raw.save("raw.fif")


2. Start a PlayerLSL with looping:
```Python
from mne_lsl.player import PlayerLSL
import time
from mne.io import read_raw

raw = read_raw("raw.fif", preload=False)

player = PlayerLSL(
    raw,
    chunk_size=10,
    name="tutorial-annots",
    annotations=True,
    n_repeat=10,
).start()

time.sleep(10000)

Wait until stream reaches the end of the dataset for the first time and errors without looping

CC: @timonmerk

I never tried with non preloaded dataset, good point! The modulo approach sounds good to me, would you be up for a PR adding also a test with a non-preloaded dataset?

And thanks a lot for the debugging investigation, the bug report was super clear!

Hi @mscheltienne , thanks for your prompt answer. The debugging was quite straightforward thankfully, just following the source of the error through the class hierarchy until I hit the exception.

We found the issue because in PyNeuromodulation we are using the mne-tools/mne-bids package to read files with the read_raw_bids function, which calls read_raw with default parameter preload=False. I realized eventually that I can pass an extra_params dictionary to enable preload but I wanted to attempt a fix anyway.

I'm up to open the PR, it's just a single character change so it's easy enough, but for the test I need some help. What's your testing worklflow? I found the file mne_lsl/test_mne.py, should I add the test there? And what should it consist of, should I generate a local dataset like in my example and check that it loops properly in a variety of conditions? I think I could manage that.

The test use pytest and all the files starting with test_ in folder(s) tests. Each module has its own tests folder, for instance:

mne_lsl/tests for test not specific to a submodule or for tests of the files mne_lsl/*.py
mne_lsl/stream/tests for test of the files mne_lsl/stream/*.py
mne_lsl/player/tests for test of the files mne_lsl/player/*.py

This last one is where the added test should live as the modification only impacts the player module. Usually, you can find one test file test_xxx.py per file in the module, xxx.py, but this is not mandatory. In this case, you can append your test to test_player_lsl.py.

This testing structure is common to many scientific python packages, including the main mne repository ;)

Next is how to test. There are some difficulties with LSL-based test because the tests must pass locally and on the limited github runner which runs the tests online. Have a look at the existing tests you will notice that:

they use the chunk_size fixture (a fixture is an argument of the test function, see pytest documentation or ChatGPT explanation if you need more info) which returns an int and sets the chunk_size to 200 on online CIs. Without this large chunk sizes, the runners end up failling a bunch of tests too often, e.g. here: https://github.com/mne-tools/mne-lsl/blob/0b2c030d58f5dd18444e908a6d6821b595027a75/mne_lsl/player/tests/test_player_lsl.py#L85
if you open an mne_lsl.lsl.StreamInlet in the test, the fixture close_io is added to the test. This fixture is a callable and is called at the end of the test function to free resources. This is not needed in real-world scenarios but very much so for the tests suite on github runners, else the destruction of the (numerous) inlet/outlets can results in a couple of segfaults. e.g. here: https://github.com/mne-tools/mne-lsl/blob/0b2c030d58f5dd18444e908a6d6821b595027a75/mne_lsl/player/tests/test_player_lsl.py#L231 and the fixture is called here, before stopping the player (i.e. we want to first disconnect the inlet before killing the outlet): https://github.com/mne-tools/mne-lsl/blob/0b2c030d58f5dd18444e908a6d6821b595027a75/mne_lsl/player/tests/test_player_lsl.py#L247-L248

Besides those 2 points, you have a regular and standard pytest test implementation in this package. My approach would be to use the existing fname fixture defined here: https://github.com/mne-tools/mne-lsl/blob/0b2c030d58f5dd18444e908a6d6821b595027a75/mne_lsl/conftest.py#L118-L127

It creates and saves to disk a short raw file where the first channel is populated with the index number. You could do 2 tests:

(1, simple) create a player from the non-preloaded raw object, test for resolution of the stream, wait for the first repeat to occur, test for resolution of the stream.

def test_xxx(fname, chunk_size, request):
    """Test xxx."""
    name = f"P_{request.node.name}"
    source_id = uuid.uuid4().hex
    streams = resolve_streams(timeout=0.1)
    assert (name, source_id) not in [
        (stream.name, stream.source_id) for stream in streams
    ]
    raw = read_raw_fif(fname, preload=False)
    with Player(raw , chunk_size=chunk_size, name=name, source_id=source_id):
        streams = resolve_streams(timeout=2)
        assert (name, source_id) in [
            (stream.name, stream.source_id) for stream in streams
        ]
        time.sleep(raw.times[-1] + 0.5)  # probably needs something else to get the duration of the 'raw'
        streams = resolve_streams(timeout=2)
        assert (name, source_id) in [
            (stream.name, stream.source_id) for stream in streams
        ]
    streams = resolve_streams(timeout=0.1)
    assert (name, source_id) not in [
        (stream.name, stream.source_id) for stream in streams
    ]

(2, advance) create a player from the non-preloaded raw object, test for resolution, connect an inlet, query the player and check for continuity of the samples index in the first channel, once you loop back to 0 (n_repeat is reached), close the inlet and stop the player.

def test_xxx(fname, chunk_size, request, close_io):
    """Test xxx."""
    # similar approach but open an inlet and query it instead of `time.sleep`
    # then before exiting the `with Player(...)` statement, don't forget to close the inlet
    with Player(...):
        inlet = ...
        close_io()

Both of those approach would fail on main since the test would raise as you observed; while they should pass on your PR. Note also the existence of from mne_lsl.utils._tests import match_stream_and_raw_data, this function might be useful if you go towards the second option ;)

Thanks a lot for all the details about the testing pipeline @mscheltienne , your write-up was very useful in writing the test. I pushed the changes, I'm not convinced about the need to use the inlet, it does not really give me much except timestamps and the raw data which the test is not using currently, but perhaps could be used. Right now is doing the function of a sleep(0.01) kinda.

mne-tools / mne-lsl

PlayerLSL bug ("Bad indexing") with preload=False and n_repeat > 1 #329