mne-tools / mne-lsl

A framework for real-time brain signal streaming with MNE-Python.
https://mne.tools/mne-lsl
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

GitHub action to rerun a step upon failure with specific exit-codes is doing nothing #301

Closed mscheltienne closed 1 month ago

mscheltienne commented 1 month ago

To try to mitigate the intermittent CI failures, I wrote this small GitHub composite action which should retry a step when one of the retry_error_codes is hit. In the case of pytest runs, the action is used here:

      - name: Run pytest
        uses: ./.github/actions/retry-step
        with:
          command: pytest mne_lsl --cov=mne_lsl --cov-report=xml --cov-config=pyproject.toml -s
        env:
          MNE_LSL_LOG_LEVEL: DEBUG

Thus, the exit codes on which a retry should be triggered are the default ones, 134 and 139 (Python Fatal Error, segmentation fault). Yet, it's clearly not working. c.f. this CI run. It's actually doing nothing, with the runner being terminated prematurely.

mscheltienne commented 1 month ago

@larsoner When able, could you have a look at the composite action? Maybe you spot why this short bash script is not rerunning the command 😭

larsoner commented 1 month ago

You could for example look at

$ set -o | grep errexit
errexit         off

If it's on for you then the eval commands will exit as soon as any command hits an error (like set -e has been). So it's possible you need to do something like:

          set +e
          (eval $command)
          set -e

To be explicit, either way at the top of your command I would do something like set -eo pipefail so that if any other commands (other than the one you want to retry) in your script fail it does exit.