iiasa / ixmp

The ix modeling platform for integrated and cross-cutting scenario analysis
https://docs.messageix.org/ixmp
Apache License 2.0
38 stars 111 forks source link

Handling flaky tests #489

Closed glatterf42 closed 1 year ago

glatterf42 commented 1 year ago

For roughly the past month, I have collected data on flaky CI tests. The initial idea was to mark them as flaky, but as per pytest's docs on flaky tests, that should never be a long-term solution. Instead, tests should be (randomly) re-ordered, re-written for more atomic assertions, or split up into different groups to find the root cause of the flaky behavior and eliminate it. We will have to see when time permits this. For now, we could mark them as flaky to save us from re-running them manually. Here are the flaky tests of this repository I gathered so far:

Flaky tests

Auxiliary

Some error message are common and so long that they would render the table even more complex than it already is. They are included here and referenced by their names in the table below.

Notebook cell timeout reticulate {#notebook-cell-timeout-reticulate}

nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 10 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
# Load reticulate, used to access the Python API from R
library(reticulate)

# Import ixmp and message_ix, just as in Python
ixmp <- import("ixmp")
-------------------

Notebook cell timeout import packages {#notebook-cell-timeout-import-packages}

nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 10 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
# load required packages
import pandas as pd
import ixmp
-------------------

Notebook cell timeout platform {#notebook-cell-timeout-platform}

nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 10 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
# launch the ix modeling platform using the local default database
mp <- ixmp$Platform()
-------------------

Runtime error DB connection {#runtime-error-db-connection}

RuntimeError: unhandled Java exception: 
Unable to obtain connection from database (jdbc:hsqldb:file:/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_multi_db_run0/mp2) for user 'ixmp': Database lock acquisition failure: lockFile: org.hsqldb.persist.LockFile@6486fbbd[file =/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_multi_db_run0/mp2.lck, exists=true, locked=false, valid=false, ] method: checkHeartbeat read: 2023-07-09 05:22:39 heartbeat - read: -862 ms.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL State  : S1000
Error Code : -451
Message    : Database lock acquisition failure: lockFile: org.hsqldb.persist.LockFile@6486fbbd[file =/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_multi_db_run0/mp2.lck, exists=true, locked=false, valid=false, ] method: checkHeartbeat read: 2023-07-09 05:22:39 heartbeat - read: -862 ms.

DB connection cannot be closed {#db-connection-cannot-be-closed}

AssertionError: assert 'Database connection could not be closed or was already closed' in ''
 +  where '' = CaptureResult(out='', err='').out

DB connection message wrong {#db-connection-message-wrong}

assert "connected to database 'jdbc:hsqldb:mem://ixmptest' (user: ixmp)..." in ''
 +  where '' = CaptureResult(out='', err='').out

Note that windows-latest-py3.10 had this additional information once:

where '' = CaptureResult(out='', err="2023-07-03 12:58:22,174  INFO at.ac.iiasa.ixmp.Platform:182 - closed the connection to database 'jdbc:hsqldb:mem://ixmptest'\r\n2023-07-03 12:58:22,180  INFO at.ac.iiasa.ixmp.Platform:165 - Welcome to the IX modeling platform!\r\n2023-07-03 12:58:22,180  INFO at.ac.iiasa.ixmp.Platform:166 -  connected to database 'jdbc:hsqldb:mem://ixmptest' (user: ixmp)...\r\n").out

Names are shortened to ixmp/tests as the starting directory. Grouping the tests by names shows commonalities between them. All notebook cell timeout originate on macos, while ubuntu and windows only struggle with DB connections. Windows in particular seems to have trouble closing the DB connection either too early or not at all.

Test name Error message Runners (# of occurrences if > 1)
test_tutorials.py:: test_R_transport_scenario Notebook cell timeout reticulate macos-latest-py3.7 (3)
test_tutorials.py:: test_R_transport Notebook cell timeout reticulate macos-latest-py3.7 (5), macos-latest-py3.11
test_tutorials.py:: test_py_transport Notebook cell timeout import packages macos-latest-py3.7
test_tutorials.py:: test_py_transport_scenario Notebook cell timeout import packages macos-latest-py3.7
test_tutorials.py:: test_R_transport_scenario Notebook cell timeout platform macos-latest-py3.11
------------------ ------------------ ------------------
------------------ ------------------ ------------------
test_access.py:: test_check_single_model_access ConnectionRefusedError: [Errno 61] Connection refused macos-latest-py3.7
------------------ ------------------ ------------------
------------------ ------------------ ------------------
test_integration.py::test_multi_db_run Runtime error DB connection macos-latest-py3.8, macos-latest-py3.10
------------------ ------------------ ------------------
------------------ ------------------ ------------------
backend/test_jdbc.py:: test_jvm_warn AssertionError: ResourceWarning("unclosed file <_io.BufferedReader name='/tmp/pytest-of-runner/pytest-0/test_read_excel_big0/output.xlsx'>") assert 1 == 0 where 1 = len(WarningsRecorder(record=True)) ubuntu-latest-py3.8
------------------ ------------------ ------------------
------------------ ------------------ ------------------
backend/test_jdbc.py:: test_close DB connection cannot be closed windows-latest-py3.7 (3), windows-latest-py3.8, windows-latest-py3.10 (2), windows-latest-py3.11 (2)
backend/test_jdbc.py:: test_connect_message DB connection message wrong windows-latest-py3.7 (3), windows-latest-py3.8, windows-latest-py3.10 (2), windows-latest-py3.11 (2)
glatterf42 commented 1 year ago

For future reference: the flaky tests on the backend seem to be related to pytest's capfd not capturing stdout and stderr reliably on Windows. See also https://github.com/pytest-dev/pytest/issues/10843.