mne-tools / mne-lsl

A framework for real-time brain signal streaming with MNE-Python.
https://mne.tools/mne-lsl
BSD 3-Clause "New" or "Revised" License
60 stars 29 forks source link

CI: Fix retry action #306

Closed larsoner closed 3 months ago

larsoner commented 3 months ago

Closes #301 First part of the suggestion from https://github.com/mne-tools/mne-lsl/issues/301#issuecomment-2269570706. Let's see if the default is to have errorexit on.

larsoner commented 3 months ago

Confirmed errexit on at least. Now let's see if we can get it to fail three times...

larsoner commented 3 months ago

Okay it exited with code 0 even though I passed a command that should fail, I think because of a bug in the logic about attempt count

larsoner commented 3 months ago

Okay the Debug step in https://github.com/mne-tools/mne-lsl/pull/306/commits/01971cc059ef2e1b4bef4312d9516ab9782cfbd3 seems to be working

https://github.com/mne-tools/mne-lsl/actions/runs/10253578557/job/28366526502?pr=306

larsoner commented 3 months ago

Appears to have worked on the Pytest step, too.

Command exited with code 1 after 1 attempts.

I guess maybe you also want exit code 1 in retry_error_codes not just 134,139?

larsoner commented 3 months ago

Appears to have worked as this one took two attempts:

https://github.com/mne-tools/mne-lsl/actions/runs/10253825291/job/28367329623?pr=306#step:8:5621 https://github.com/mne-tools/mne-lsl/actions/runs/10253825291/job/28367329623?pr=306#step:8:12024

mscheltienne commented 3 months ago

set -eo pipefail, set +e and set -e.. I completely missed that part, thanks! Actually, I don't want the CIs to retry on exit-code 1, only on segmentation fault and python fatal error which still occur at least once every 20/30 runs probably during the tear-up/clean-ups. If the tests fails, it should be fixable in the tests, and there is no need to wait for 3 retries to get that information ;)

larsoner commented 3 months ago

Okay look at 18c2a49 then it had one or two jobs fail with exit 1

larsoner commented 3 months ago

Like this run specifically https://github.com/mne-tools/mne-lsl/actions/runs/10253602827/job/28366606652

larsoner commented 3 months ago

Maybe those tests could use https://pypi.org/project/pytest-retry/ since they're flaky but don't default segfault?

mscheltienne commented 3 months ago

Like this run specifically https://github.com/mne-tools/mne-lsl/actions/runs/10253602827/job/28366606652

Yes, in this one, an LSL stream from other tests was not properly terminated.. Probably one of the rarest flakiness.

Anyway, the idea was to have this small action to recover from hard crashes; and eventually pytest-retry, flaky or pytest-rerunfailures for flaky test, but it might not even be needed. For instance, it would not have salvage the test run with this additional LSL stream that was not properly terminated.

mscheltienne commented 3 months ago

But this run https://github.com/mne-tools/mne-lsl/actions/runs/10253602827/job/28366606652#step:8:14480 now suggest that when an hard crash occurs, LSL streams are not properly terminated and are still lingering in the background, making the test suit fail on the next iteration when looking for a fix number of streams with resolve_streams().. I'll mark those test/part as xfail and will add a random uuid str at the end of the stream names to guarantee uniqueness.