Closed glatterf42 closed 2 months ago
Some notes here:
*_scenario
tutorials seem to rely on the *
tutorials running first. However, this grouping might not solve the issue of these required tutorials failing in the first place.test_export_timeseries_data
is failing. It almost seems like occasionally, test_mp.export_timeseries_data()
doesn't find the data it's supposed to export. Is this test relying on other tests filling the fixture test_mp
with data?FLAKY
markers and see if we can resolve more of them.The fixes in test_integration
seem to work locally, but I'm not sure they are optimal yet. They seem to hardcode how many rows in the scenario
column of the expected dfs need to be changed. I'm not sure we want to do that, but maybe it's okay to expect these particular tests to yield the same results.
On my system and on the windows-py3.12 platform here, there is this test failure in test_base.py
:
FAILED ixmp/tests/backend/test_base.py::TestCachingBackend::test_del_ts - AssertionError: assert 0 == 1
E AssertionError: assert 0 == 1
E + where 1 = len({(133743699349232, 'par', 'd'): i j value unit\n0 seattle new-york 2.5 km\n1 seattle c...a 1.8 km\n3 san-diego new-york 2.5 km\n4 san-diego chicago 1.8 km\n5 san-diego topeka 1.4 km})
E + where {(133743699349232, 'par', 'd'): i j value unit\n0 seattle new-york 2.5 km\n1 seattle c...a 1.8 km\n3 san-diego new-york 2.5 km\n4 san-diego chicago 1.8 km\n5 san-diego topeka 1.4 km} = <ixmp.backend.jdbc.JDBCBackend object at 0x79a3a1141520>._cache
ixmp/tests/backend/test_base.py:161: AssertionError
In other words, these lines don't seem to garbage collect the cached value correctly. This seems related to #463 and #504, and the issue is resolved by running s.__del__()
for all python versions. @khaeru, if there's no specific reason to use del s
, I'd just switch to s.__del__()
.
My system is running Python 3.12 with pandas 2.2.0 and JPype1 1.5.0.
Attention: Patch coverage is 99.15254%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 98.8%. Comparing base (
90cfe46
) to head (12464ed
). Report is 28 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
ixmp/tests/test_tutorials.py | 94.4% | 1 Missing :warning: |
@khaeru, if there's no specific reason to use
del s
, I'd just switch tos.__del__()
.
We can answer this by first thinking about the intended behaviour of the subject/tested code:
s
) is used, its reference count should decrease, maybe to zero.Scenario.__del__
(https://docs.python.org/3/reference/datamodel.html#object.__del__ —see the "Note:" in the grey box.)del s
or s.__del__()
manually; rather, whenever they give up all references to the Scenario, the clean-up will happen.The reason del s
appears in the test function is to trigger Python's GC earlier, before the end of the test function, so that the following assertion can be made about the post clean-up state.
The comment points out that in certain cases, the intended behaviour doesn't work (we haven't diagnosed why). So s.__del__()
forces the GC to occur and the test to pass. But because it is forced, the test is no longer checking that it will happen autonomously. We might expect that garbage collection does not work at all for a user with that combination of JPype and Python.
So to extend: if s.__del__()
were called in the test in every case, then we no longer will test that Python's built-in reference-counting/GC can or will trigger Scenario.__del__
—including with the up-to-date Python, JPype, and pandas you mention—and can't claim that this feature is working.
Thanks for the explanation, so I'll revert/drop that commit and try to figure out how del s
can call Scenario.__del__
again.
One other note: I've searched for other flaky
markers and think I resolved another one, but there's one for test_jvm_warn where I don't know what to do. The original flaky error is described in https://github.com/iiasa/ixmp/issues/489 and it just seems that this test sometimes occurs simultaneously as another test reading a big file, which results in a warning. Maybe we can suppress this specific warning, but I'm not sure what else we could do.
I'm not sure I completely agree with your analysis. The flaky function is called test_del_ts
and its docstring is """Test CachingBackend.del_ts()."""
. However, we don't explicitly call CachingBackend.del_ts()
in the test at the moment.
del_ts
is a guaranteed part of Scenario.__del__
, but as the Note box in the docs you linked above points out:
del x
doesn’t directly callx.__del__()
— the former decrements the reference count forx
by one, and the latter is only called whenx
’s reference count reaches zero.
In particular, the docs also say:
CPython implementation detail: It is possible for a reference cycle to prevent the reference count of an object from going to zero. In this case, the cycle will be later detected and deleted by the cyclic garbage collector. A common cause of reference cycles is when an exception has been caught in a local variable. The frame’s locals then reference the exception, which references its own traceback, which references the locals of all frames caught in the traceback.
Due to the precarious circumstances under which del() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead.
So the current flaky behaviour we see only reflects exactly that: del s
doesn't always call Scenario.__del__
. With that, I'm wondering: should it? And do we want to test it here?
Especially to the latter, the answer seems to be "No" to me. With this test, according to its name and docstring, we only want to ensure that CachingBackend.del_ts
works as intended, which it does (locally at least, the flakiness is resolved by always calling backend.del_ts(s)
explicitly).
I realize that this theoretically leaves part of our code untested, the part where we call Scenario.__del__
. In practice, however, I expect this finalizer to be called multiple times, e.g. as part of this very function (just for a limited number of python versions).
In addition, the note that "all exceptions during the execution of del() methods are ignored" makes me wonder if we even need the try-clause in our Scenario.__del__
. Apparently, these exceptions would only result in warnings anyway. Or was it a conscious choice to suppress these warnings?
I'm not sure why build
couldn't be installed for macos-py3.8, but I don't think that's on us. The error in windows-py3.12 is indeed still flaky and appears to be almost exactly the same as the one above: the test calls del s
for a number of Scenario
s, but maybe calling del_ts
explicitly would be more accurate.
The tests seem to be doing okay this time around, but I'm not sure we want to keep the solution. In essence, the problem was that the JDBCBackend
has a .jindex
attribute, which is a WeakKeyDictionary
. This means that keys will be discarded automatically once no strong reference to the keys exists at runtime. But when trying to simply call backend.del_ts()
in the test_jdbc::test_del_ts
, this didn't always work. My guess is that just as before, a strong reference to some key could be stored as part of a reference cycle. That way, del s
will not lead to s.__del__()
being called (which would call backend.del_ts()
), but calling backend.del_ts()
is still not enough since we rely on all strong references vanishing to adapt backend.jindex
and we check that len(backend.jindex)
changes in our tests.
So I've now adapted backend.del_ts
to remove the ts
from jindex
after garbage collection to make sure it vanishes, even if a strong reference still exists in some reference cycle. However, this would of course remove ts
from jindex
irrespective of the reason a strong reference still exists.
However, this would of course remove
ts
fromjindex
irrespective of the reason a strong reference still exists.
Hm—in theory, backend.delts is not going to get called unless (a) there are no strong references anywhere, so TimeSeries.\_del__ is triggered, or (b) the user calls it explicitly.
(But since .jindex is WeakKeyDictionary, the presence (or absence) of ts
as a key in that should not be making any difference, right? Or maybe I have not fully thought through the implications of keys vs. values vs. both being weak in such collections. If the key has some dangling strong ref, then that prevents its removal from the dict and thus the value (the JPype/Java/ixmp_source object) also continues to sit there.)
In any case I think this is a safe change to make. Unless the user does (b), this is not going to throw away objects they are still using. I would put the added line in the reverse order, though:
self.jindex.pop(ts, None)
self.gc()
Working on #524, I noticed that the Jupyter notebook tests of macos-latest-py3.12 failed several times before eventually passing. They also failed on the first run on main
after merging the PR.
This PR was superseded (and the work started here completed) in #543.
Closes #512.
Analogous to https://github.com/iiasa/message_ix/pull/784, this PR aims at making the ixmp tests less flaky. For now, this is largely done by giving
Scenario
s andPlatform
s test-specific names.This PR reverts the mitigation from #510. If all goes surprisingly well, this won't be a problem. Either way, it should not be a problem in the end if the intended changes are complete.
How to review
PR checklist