Closed glatterf42 closed 8 months ago
@meksor I haven't quite figured out how to best address even the first pandas warnings that I started to look at. In particular, please check out data/db/meta/repository.py: the original solution is now triggering this:
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
/home/fridolin/ixmp4/ixmp4/data/db/meta/repository.py:176: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
return df.groupby("type", group_keys=False).apply(map_value_column)
Until now, I haven't been able to replicate the exact behaviour of this without triggering warnings. The suggested solutions don't work for me; include_groups=False
leads to an error because "type"
is not available for being changed in map_value_column()
, and the only way I found to select a group (get_group()
) can only ever select one group, while we want to work on all groups. Maybe looping over groups is still the preferable solution here since the others aren't quite working either.
The other solutions all come from the same question on StackOverflow, specifically these answers:
None of these are quite there yet, I feel the one from numba might even be closest, but numba.jit doesn't seem to support types generic enough to hold our value
column, I'm afraid.
Please also note that my solution for the first FutureWarning was to just not use pd.DataFrame.fillna()
at all since the test dataframe already contains only np.nan
and None
, but this might not generally be true for future use cases.
We also already receive warnings for the next big changed announced for pandas 3.0.0: Copy-on-Write will be the default on only mode of operation. Fortunately, there already exists a migration guide and we can invest some time to future-proof our code now.
plf
did and keep an eye on the project in case someone picks it back up. apply
has always been very slow. But if you find a good solution its probably best to use it everywhere apply
is used. nan
s with None
s but it may be best to check with @danielhuppmann ? Its probably also wise to bake those requirements into a test once clarified. Thanks for the feedback @meksor, I'll try to incorporate it further soon.
@danielhuppmann, @phackstock: I want to prioritize this PR to support Python 3.12, so we should aim for a release soon after following this PR. Hopefully, pyam-iamc can then follow suit and support 3.12, which would allow message-ix-models and message_data to support it. Currently, new users are running into issues by seeing that message_ix and ixmp do support 3.12 and not seeing that the rest of the stack doesn't necessarily. Unfortunately, though, I have a small presentation to set up before I can focus on this.
Thanks @glatterf42, sounds good to me to support python 3.12, I'll make sure that pyam will follow suit after the ixmp4 release. It would be critical to have the pyam-ixmp4-release before end of February 22 so that users can directly query the ixmp4 ssp-extensions database via our Python tools (and the public launch of that project will be end of February).
Wrt lazy_fixtures, I'm actively following that discussion. Pytest might be incorporating lazy_fixtures in its core implementation, but that also might take some time.
I've reverted the pandas update and mypy is happy after the rebase, too.
The PR now also checks python 3.12 and it's going surprisingly well. Locally, I ran into another error with test_benchmarks_filter (in the api layer, will try to replicate tomorrow or so), but these don't happen here, so I don't know if we're good. I'll also check if dateutil and multimethod can be updated, but apart from these two warnings (and potentially some documentation), we seem to be almost good to go for 3.12 :)
Re: dateutil: This is not a direct dependency of ours and we don't call the deprecated function anywhere in our code. Dependencies like pandas and jupyter do depend on dateutil, though, which has last been released 2.5 years ago. Their current master branch already contains a fix for the DeprecationWarning we are seeing, but they haven't managed to publish a new release yet, despite there being large popular demand:
So for now, if we can live with the DeprecationWarning, I'd suggest keeping track of it with a new issue, but merging this PR anyway (if the other things are dealt with).
Re: multimethod: same here, this is a transitive dependency of ours. Pygments and pandera are using it and the issue arises due to an incompatibility of pandera 0.18.0 and multimethod 1.11.0. This issue has already been raised with pandera and a version pin introduced on their main branch, but this was just ten hours ago, so they didn't have time for a new release yet. We might get a notification about a new release by checking this PR:
In this case, it seems more likely that a patch release is coming soon, so we might not need to keep track with a new issue.
Finally, this is the error I receive 12 times when running test_filter_benchmarks locally:
________ ERROR at teardown of test_filter_datapoints_benchmark[filters0-test_pgsql_mp_generated] ________
request = <SubRequest 'profiled' for <Function test_filter_datapoints_benchmark[filters0-test_pgsql_mp_generated]>>
@pytest.fixture(scope="function")
def profiled(request):
testname = request.node.name
pr = cProfile.Profile()
@contextlib.contextmanager
def profiled():
pr.enable()
yield
pr.disable()
yield profiled
> ps = pstats.Stats(pr)
tests/conftest.py:95:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3.12/pstats.py:115: in __init__
self.init(arg)
/usr/lib/python3.12/pstats.py:129: in init
self.load_stats(arg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pstats.Stats object at 0x7f81bd7662d0>, arg = <cProfile.Profile object at 0x7f81bdcd7b60>
def load_stats(self, arg):
if arg is None:
self.stats = {}
return
elif isinstance(arg, str):
with open(arg, 'rb') as f:
self.stats = marshal.load(f)
try:
file_stats = os.stat(arg)
arg = time.ctime(file_stats.st_mtime) + " " + arg
except: # in case this is not unix
pass
self.files = [arg]
elif hasattr(arg, 'create_stats'):
arg.create_stats()
self.stats = arg.stats
arg.stats = {}
if not self.stats:
> raise TypeError("Cannot create or construct a %r object from %r"
% (self.__class__, arg))
E TypeError: Cannot create or construct a <class 'pstats.Stats'> object from <cProfile.Profile object at 0x7f81bdcd7b60>
/usr/lib/python3.12/pstats.py:155: TypeError
Unfortunately, we will have to look into them as they only appear successful here while being skipped. I receive the above error for every filter (0 through 5) for both test_pgsql_mp_generated
and test_api_pgsql_mp_generated
.
Here's some more information about the test_benchmark_filter tests:
generated_mp
is only triggered within the test function, which results in a pytest.skip
statement. However, the profiled
fixture is evaluated before the test function, too, and once it's triggered in the function, it results in the TypeError above (which is why I get two results locally for each _pgsql_
test).pytest.skip
marker from generated_mp
was still applied to the function even though usually skip markers are evaluated before fixtures are run._mp
fixtures (all of them) in a way that allows their pytest.skip
s to stop other fixtures from running.The current solution is to test once if a connection to a postgresql db can be established and skip all corresponding tests if it can't. This fix is now applied for all pqsgl_mp
fixtures. My main question for @meksor is: do we need https://github.com/iiasa/ixmp4/blob/1013429d436706453869c1846fccc299c6daa75b/tests/conftest.py#L31-L32 to safely disconnect from the db? If yes, mypy would not be happy about that, I think.
Please check if we're missing stated-py3.12 support in the docs; apart from that, this might pretty much be good to go :)
Yes, we need backend.close! Why does mypy care?
Mypy is showing "Backend" has no attribute "close" Mypy
attr-defined
. Curiously, this error is not shown e.g. here: https://github.com/iiasa/ixmp4/blob/55a4774b54cbb9c9b6b61dcdccb6b6c0b77ae5c7/tests/conftest.py#L115-L118
And when I insert a yield mp
statement between my mp =
and mp.backend.close()
lines, the mypy error goes away (even though yield is not allowed outside of a function). So my guess is that mypy assumes something is happening to mp
after it's yielded such that it does have .backend.close()
at the time we call it. Which is probably not what we want, it should probably just recognize the function the first time around.
In other news, what do you mean with 'we need backend.close
'? The tests seem to have been running fine even without it.
This PR should now be good to go; fixed the spelling mistake (that already existed before this PR) and updated all badges/docs in general.
Looks like the tests are currently failing because the network at https://api.dev.manager.ece.iiasa.ac.at/v1/ is unreachable.
I'm going to merge this now, please let me know if we need any cleanup and I'll set up a new PR.
While troubleshooting a message-ix-models CI failure, I noticed that I couldn't update to the latest version of the dependencies because of outdated dependency versions in ixmp4 0.6.0. So this PR will eventually bump these/all dependencies.
At the moment, though, it still faces several warnings that need to be addressed. I'm not sure we can remove all of these warnings, though:
data/db/base.py:bulk_upsert_chunk()
in themeta
tests) as detailed here.db/filters.py
, where the result offield_info.json_schema_extra.get()
can be a multitude of types, while we only expect to deal with specific ones (though for line 189, I'm not sure whether we're expecting a dict or a list).However, once these are clarified, I suggest releasing the next minor version.
Side note: I also want to try if we can support python 3.12; we might need a more recent version of e.g. pluggy, but this is transitive, so I don't know how much work this might be.