Fix negative start index when looping around dataset

toni-neurosc commented 1 month ago

Attempted fix for issue #329

It's a single character change, I'm not sure it merits a version change, for now I'm setting up preload=True as a workaround but of course this is limited by available memory.

@mscheltienne suggested I write a test, so I will update the PR with that soon.

mscheltienne commented 1 month ago

Tears failures are unrelated :)

toni-neurosc commented 1 month ago

Hi @mscheltienne, I have added the test, I created the StreamInlet but then I realized... it's not really that useful unless I wanted to compare the datapoints directly, which is something I could do I guess. Perhaps I should delete the inlet? Or do you have an idea how to use it to improve the test?

Currently the test is monitoring the internal repeat counter of the PlayerLSL and also checking the _start_idx internal variable to see when it loops around and check that the internal repeat counter is consistent with the index loop-around. Is a bit redundant but better to be sure I guess.

@pytest.mark.slow
def test_player_n_repeat_mmapped(fname, close_io, chunk_size, request):
    """Test argument 'n_repeat' with non-preloaded raw."""
    raw = read_raw_fif("raw.fif", preload=False)
    name = f"P_{request.node.name}"
    source_id = uuid.uuid4().hex

    with Player(
        raw, chunk_size=100, n_repeat=10, name=name, source_id=source_id
    ) as player:
        streams = resolve_streams(timeout=2)
        assert (name, source_id) in [
            (stream.name, stream.source_id) for stream in streams
        ]

        inlet = _create_inlet(name, source_id)

        start_idx = player._start_idx
        repeats = 1  # PlayerLSL internal repeat counter starts at 1

        # Check up to 2 repeats
        while player._n_repeated <= 2:
            data, ts = inlet.pull_chunk()

            # Increase counter if internal start index loops back to the beginning
            if player._start_idx < start_idx:
                repeats += 1
            start_idx = player._start_idx

            # Make sure the repeat counter is incrementing correctly
            assert player._n_repeated == repeats

        close_io()

I also fixed a bug in my solution to the issue which caused the data chunk that spanned the end and start of the file to be in reverse, cause I was looping around the start when I should have been looping around the stop. Before start was a negative index and used to get the last samples, while stop was being looped to the beginning to get the first samples. Now start remains positive and needs to stay untouched it is to get the samples up to the last one, while stop needs to be modulo'ed to get looped around to the start to get the beginning of the file again.

toni-neurosc commented 1 month ago

Alright, I think those last commits are the last. I have used the inlet channel 0 as you suggested to check that when looping around we always find the last sample followed by the first one (index 0) again, and when the last index matches the last data point of the chunk, I check that the next chunk starts at 0.

I also fixed a bug when the chuck_size is a divisor of the number of timepoints, because start is incremented without checking bounds during "normal" iterations (non-looping). I tried always incrementing with modulo but this does not work because the function relies on the indexes going past the dataset length to detect loop or termination, so this is the minimal fix without changing the whole approach.

Btw, sorry for the force-push, I botched a rebase due to the Ruff formatting hook.

mscheltienne commented 1 month ago

Great, hopefully I'll have a look tomorrow :smile: Thanks a lot for the efforts!

Btw, sorry for the force-push, I botched a rebase due to the Ruff formatting hook.

No worries, it's your branch on your repo anyway, do whatever you need on it :wink:

mscheltienne commented 1 month ago

Thanks @toni-neurosc

mne-tools / mne-lsl

Fix negative start index when looping around dataset #330