iiasa / message_ix

The integrated assessment and energy systems model MESSAGEix
https://docs.messageix.org
Apache License 2.0
119 stars 153 forks source link

Make tests robust & independent #784

Closed khaeru closed 7 months ago

khaeru commented 8 months ago

This PR is to collect changes related to #776. As we notice tests "flaking", we can adjust them on this branch, and then merge once we feel we've addressed all observed flaky tests. The PR will thus remain open for a while until we're convinced we've reached that point.

How to review

PR checklist

codecov[bot] commented 8 months ago

Codecov Report

Attention: Patch coverage is 99.53052% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 95.4%. Comparing base (0fb01b4) to head (edfb57d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #784 +/- ## ======================================= - Coverage 95.5% 95.4% -0.1% ======================================= Files 45 46 +1 Lines 4312 4351 +39 ======================================= + Hits 4118 4153 +35 - Misses 194 198 +4 ``` | [Files](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa) | Coverage Δ | | |---|---|---| | [message\_ix/cli.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC9jbGkucHk=) | `92.7% <100.0%> (-0.2%)` | :arrow_down: | | [message\_ix/report/operator.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC9yZXBvcnQvb3BlcmF0b3IucHk=) | `97.1% <ø> (ø)` | | | [message\_ix/testing/\_\_init\_\_.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0aW5nL19faW5pdF9fLnB5) | `99.6% <100.0%> (+<0.1%)` | :arrow_up: | | [message\_ix/tests/test\_cli.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2NsaS5weQ==) | `100.0% <100.0%> (ø)` | | | [message\_ix/tests/test\_core.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2NvcmUucHk=) | `100.0% <100.0%> (ø)` | | | [...age\_ix/tests/test\_feature\_bound\_activity\_shares.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2ZlYXR1cmVfYm91bmRfYWN0aXZpdHlfc2hhcmVzLnB5) | `100.0% <100.0%> (ø)` | | | [message\_ix/tests/test\_feature\_capacity\_factor.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2ZlYXR1cmVfY2FwYWNpdHlfZmFjdG9yLnB5) | `100.0% <ø> (ø)` | | | [message\_ix/tests/test\_feature\_price\_emission.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2ZlYXR1cmVfcHJpY2VfZW1pc3Npb24ucHk=) | `100.0% <100.0%> (ø)` | | | [message\_ix/tests/test\_feature\_storage.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2ZlYXR1cmVfc3RvcmFnZS5weQ==) | `100.0% <ø> (ø)` | | | [message\_ix/tests/test\_feature\_temporal\_level.py](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peC90ZXN0cy90ZXN0X2ZlYXR1cmVfdGVtcG9yYWxfbGV2ZWwucHk=) | `100.0% <ø> (ø)` | | | ... and [8 more](https://app.codecov.io/gh/iiasa/message_ix/pull/784?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa) | | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/iiasa/message_ix/pull/784/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa)
glatterf42 commented 8 months ago

Thanks for opening this PR :) I don't immediately have time to address these, but here are the flaky tests I found since 11.01.2024:

khaeru commented 8 months ago

Great. We can add those to the PR checklist up top!

glatterf42 commented 8 months ago

I kept them down here because I recently had to rework a PR and found editing the description tedious since it included a lot of outdated material, so wanted to avoid that here. But they are added now.

glatterf42 commented 7 months ago

Once we address this, we should do something similar for ixmp. Today, the CI there is flaky enough to require four additional runs.

glatterf42 commented 7 months ago

We can also look at the tests we marked as flaky in https://github.com/iiasa/message_ix/issues/731 and try to sustainably 'un-flake' them here.

glatterf42 commented 7 months ago

The latest failures seem to be about these lines: https://github.com/iiasa/message_ix/blob/be4dda40691ee74c358c0cc35f0e7939dda8a42a/message_ix/tests/test_feature_bound_activity_shares.py#L75-L76 and https://github.com/iiasa/message_ix/blob/be4dda40691ee74c358c0cc35f0e7939dda8a42a/message_ix/tests/test_feature_bound_activity_shares.py#L98-L99

When adapting the tests, I had locally at one point scen.solve(quiet=True, lpmethod=2), which gave me the exact same error that we're seeing now here. However, on my local system the current version works without error.

glatterf42 commented 7 months ago

However, the error marked as "Job 1" in the PR description also still appears here. However, the other flakiness has not occurred again on the most recent runs.

glatterf42 commented 7 months ago

We still get the same errors with the latest commit as described above, but in addition, we have

FAILED message_ix/tests/test_feature_bound_activity_shares.py::test_commodity_share_lo - assert False
 +  where False = <function isclose at 0x000001C6C61DB130>(0.15384615384615385, 0.0)
 +    where <function isclose at 0x000001C6C61DB130> = np.isclose

in windows-latest-py3.10 and a tutorial error I've never seen before.

Context for my choice of 'default' lpmethod value: https://www.gams.com/latest/docs/S_CPLEX.html#CPLEXlpmethod. Though I'm beginning to think it's not actually about this parameter.

khaeru commented 7 months ago

One puzzle I ran into here: tests were failing on macOS runners due to a GAMS error at this line: https://github.com/iiasa/message_ix/blob/0fb01b485f8d054c9fe15eda23eee6c07ba2383a/message_ix/model/MESSAGE/data_load.gms#L6

The use of temporary paths and fixtures resulted in a long path %in%, such that the total length of the string passed to put_utility was more than 255 characters. This resulted in a GAMS compilation error and overall failure of the test.

The work-around was to shorten the test name.

khaeru commented 7 months ago
FAILED message_ix/tests/test_integration.py::test_run_remove_solution - assert False
where False = <function isclose at 0x106197490>(nan, 153.675)
where <function isclose at 0x106197490> = np.isclose

On: macos-latest-py3.10

This one hasn't been seen recently, but I note that we didn't make any particular change to address it. We can keep an eye open for it recurring in the future.