Regression errors thrown in notebooks on recent/updated (Dec 2021) package environment (pyam 1.0+, possibly other packages also implicated)

iiasa / ipcc_sr15_scenario_analysis

Scenario analysis notebooks for the IPCC Special Report on Global Warming of 1.5°C

https://data.ene.iiasa.ac.at/sr15_scenario_analysis

Apache License 2.0

64 stars 32 forks source link

Regression errors thrown in notebooks on recent/updated (Dec 2021) package environment (pyam 1.0+, possibly other packages also implicated) #39

Open bmcmullin opened 2 years ago

bmcmullin commented 2 years ago

(I'm new to ipcc scenario analysis with python/notebooks: apologies if overlooking something simple/obvious, or if opening a github issue is not the preferred way of raising this kind of question...)

I am trying to run the notebook spm_sr15_statements.ipynb. At:

with open("sr15_specs.yaml", 'r') as stream:
specs = yaml.load(stream, Loader=yaml.FullLoader)

it throws:

ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pyam.run_control.RunControl'
in "sr15_specs.yaml", line 48, column 14

Relevant (?) version info:

python 3.7.10 pyam 1.2.0 pyyaml 6.0 yaml 0.2.5

I've tried randomly downgrading pyam and pyyaml but to no effect.

As a workaround, I have just commented out the entire definition of run_control from sr15_specs.yaml. That allows everything in spm_sr15_statements.ipynb to run. I have not checked the other notebooks that also use sr15_specs.yaml, so there may or may not be limitations to this workaround. Any other suggestions welcome.

bmcmullin commented 2 years ago

Update: I see via issue #15 that sr15_specs.yaml can be (re-)built from the sr15_2.0_categories_indicators.ipynb notebook. So I thought I would try doing that and see if it resolved my original issue. But sadly, sr15_2.0_categories_indicators.ipynb also fails about half way through (before getting to writing out sr15_specs.yaml), at:

name = 'baseline'
sr1p5.set_meta(sr1p5.meta.apply(set_baseline_reference, raw=True, axis=1), name)

which throws:

AttributeError: 'numpy.ndarray' object has no attribute 'name'

I did then just jump to the last cell (where sr15_specs.yaml gets written) and execute that anyway. Which did produce a new, slightly different, sr15_specs.yaml; but then spm_sr15_statements.ipynb still throws the same ConstructorError. Darn it.

Possibly some/all of this could be resolved if there was some "known good" package environment specification (via a conda environment.yml?) to run these notebooks?

danielhuppmann commented 2 years ago

Sorry that I didn't add a proper environment specification at the time... Seems that the last time I updated and ran the notebooks, pyam 0.5 was the stable release.

I'll try to see if I find out what the issues are with running the notebooks with a current installation.

bmcmullin commented 2 years ago

Thanks @danielhuppmann!

FWIW: on this original issue (spm_sr15_statements.ipynb throwing ConstructorError), it seems that this can be resolved by replacing:

specs = yaml.load(stream, Loader=yaml.FullLoader)

with:

specs = yaml.load(stream, Loader=yaml.Loader)

I have been trying (unsuccessfully!) to understand the differences between yaml.Loader and yaml.FullLoader that might account for this; or, more to the point, whether there is any specific downside to using yaml.Loader here. Certainly wouldn't go as far as to say this is a "fix" without more clarity on that...

bmcmullin commented 2 years ago

A long shot, but: in terms of trying to reconstruct a working package environment, if you have a current conda environment that might have worked in the past, you could try conda list --revisions? See: Conda revisions: letting you ‘rollback’ to a previous version of your environment.

(I guess what one would really like would be a way of telling conda to install packages subject to a release date constraint, set to the date when ipcc_sr15_scenario_analysis was last updated. pypi-timemachine seems to do something like that for pip, but I can't see anything directly equivalent for conda. Maybe there would be a roundabout way of doing it in pip first and then importing an environment specification into conda...)

bmcmullin commented 2 years ago

So: for sr15_2.0_categories_indicators.ipynb there are definitely problems with any pyam>=1.0; but falling back to pyam=0.13.0 helps a lot. It also usefully provides a number of deprecation warnings identifying things that then do indeed fail with pyam>=1.0. Specifically, references to IamDataFrame methods for specific kinds of plot (e.g. line_plot()) seem to have been replaced with a generic plot() method and an extra argument kind to specify the kind; and a reference to IamDataFrame.scenarios() method needs to replaced with the list IamDataFrame.scenario. These are flagged as API changes in the pyam 1.0.0 release notes.

Separately, there were also several places where errors were thrown on a call of dataFrame.apply() with an argument raw=True. For example:

sr1p5.set_meta(sr1p5.meta.apply(set_baseline_reference, raw=True, axis=1), name)

throwing:

AttributeError: 'numpy.ndarray' object has no attribute 'name'

More as a guess than informed understanding, I switched to raw=False and that seems to work.

Other similar cases:

exceedance_meta = median_temperature.apply(exceedance, axis=1, raw=False,
                                       years=median_temperature.columns, threshold=1.5)

exceedance_meta = median_temperature.apply(exceedance, axis=1, raw=False,
                                       years=median_temperature.columns, threshold=2)

This may be a pandas version issue, but I have not tried to pin it down further (and maybe there is no need...).

That's it for now: progress of a sort. As this issue has now sprawled from its original narrow location in the spm_sr15_statements.ipynb I may also update the issue title for clarity...

danielhuppmann commented 2 years ago

Thanks for your work on resolving this issue, @bmcmullin! Very glad that the extra effort with deprecation warnings when going from the hot-development phase to the first stable release was helpful...

I started investigating the errors when running the notebooks with a current installation of pyam and all dependencies.

In particular, I don't see an issue with reverting to the yaml.Loader, as you suggest. Also, replacing

specs['run_control'] = rc

in sr15_2.0_categories_indicators.ipynb with

specs['run_control'] = rc.store

yields a cleaner yaml file (without needing a constructor).

In short, yaml.FullLoader is a security issue (because it could be used to execute code) and Python devs (rightly) tightened the standard behavior to make it more difficult to run insecure code.

You have also correctly identified the renamed plotting library (relative easy to fix throughout all notebooks) and the changed behavior of apply() in pandas (more difficult to tell if this has other repercussions - doesn't look like it at first glance).

bmcmullin commented 2 years ago

OK @danielhuppmann - I've got everything running that I need anyway for the moment. Plots were rendering a bit on the small side in sr15_2.0_categories_indicators.ipynb, so I added a generic plot configuration of:

plt.rcParams["figure.figsize"] = (20, 10)

If doing a new release it might be good to just include a conda environment.yml. Alternatively (or as well?) maybe include a dump of package versions within each notebook (so that it shows up in the static release versions even before attempting to re-run). This stackoverflow thread discusses the latter option, via package session_info - albeit that package itself seems to be available only via pip. Something like this, perhaps in the last cell (so guaranteed to capture all import calls), but with a note at the top explaining where to find it:

import session_info
session_info.show()

Best wishes and thanks again for all the effort in providing and maintaining these resources. A really great support for understanding, transparency and reuse.

Xuan0211 commented 2 years ago

I am new here too,and even worse,I am not a English Speaker,In the help of my translate software ,I try to comment out the entire definition of run_control from sr15_specs.yaml,But it just throw the ERROR KeyError: 'run_control',so I check my code,should I do something with run_control.py ? looking forward to any reply and also,sorry for my poor English that may bother you