Test SPA - Githubissues

thomasgibon commented 6 years ago

@konstantinstadler: would you be interested in testing the SPA function I wrote for pymrio? I am trying to make it as flexible as possible (especially regarding the various structures of extensions) but not sure whether I am doing this right. In technical terms, I am trying to define what type of argument "stressor" should be, right here: https://github.com/thomasgibon/pymrio/blob/master/pymrio/core/mriosystem.py#L1864

konstantinstadler commented 6 years ago

Hi Thomas Great stuff. Thanks a lot for starting that. Sorry also for the delay, but see point 1 below:

1) What is your timeline for getting this ready? I am still on paternity leave, officially until end of August. I plan, however, to start working on pymrio again around mid of August. Would that fit your timeline?

2) At one point, @majeau-bettez and us wanted to implement a general mrio/lca toolbox for SPA/SDA. Honestly, I dont know how far this is currently. The general idea was to have a separate toolbox which than could be a requirement for pymrio/pyarda. But I think this is still some work for the future - its great that you started implementing it here. But perhaps its worthwhile to doublecheck with @majeau-bettez.

3) As of 1) I dont have spent extensive time on in yet, but works great for your "testcase". Did you try it with EXIOBASE? I wonder if speed would be the crucial point; recursive calls in python are costly. One way to improve could be to use lru_cache from functools. This does currently not work as you return a dict, but I think that could be changed. But only worthwhile if speed really is the critical point. Alternatively we could try to rewrite it into a while loop (probably a good idea in order to not run into any recursion limit in python - but this can be changed with sys.setrecursionlimit )

4) Connected to 3: Before we start tinkering and also in general we need a testcase. I think its fine to use the test_mrio for that, but you/we should have a defined result (e.g. tested against the matlab result, or manually recalculated) which we test for in /test/test_math.py. This would be a requirement before I can merge back into master.

5) Regarding the stressor spec: I think it would make more sense to have SPA as a method of the extension and not the core system. Than you get rid of the need to specify the extension and only need to specify the row of the extension.

6) Great that you follow the convention of having the math separated in io_math. Just a minor point: the methods there should have no side effects. Currently you have the filename parameter also in io_math. If you think this is really needed, only include it in the SPA main method, not the one in io_math (actually, I think its not needed, as it is just a simple to_csv which is in itself much more flexible - e.g. specify the separator, encoding, ...). Also, if 'filename' in locals(): does not work, as you define filename always in the function signature, just use if filename:

7) There are some more minor points: if M is not calculated (again, dont check if in locals() but if M is None) use the method already in iomath to calculate it; dont use print in the method, if you need it for development use logging.debug

8) Also, perhaps we need to rethink the output. Instead of a dataframe perhaps a dict of dict or an own class which allows to extract and reorder as needed. Just thinking out loud, not quite sure yet

These are just some quick notes, not all full rethought. Anyway, great work and I very much looking forward to work on that together with you (try to respond before if there are any question, but cant do much development now).

best kst

thomasgibon commented 6 years ago

Thanks Konstantin for the extensive answer!

Mid-August sounds good, there is no urgency on my side. I know you do not have time at the moment, no worries.
This is great news, I think that would prove very useful. Please tell me if I can help in any way.
Yes, I tried with EXIOBASE3, monetary version. I have not tried the hybrid version with pymrio, I need to know how to treat negative exchanges in the Z and A matrices. The system calculations end up outputting negative footprints. I know this is a separate discussion but: should the by-product exchanges be zeroed out? (which is consistent with the values in the primary production vector if it is to be used as x), or change their sign so they're all positive? (which is consistent with the principles of building the supply table).
I will try to work on a systematic test case, but first point 8 (on the format) needs to be set. Added as issue.
Great idea! That makes much more sense, only specifying the index (integer) or the row name (string). I will look into implementing that.
Added as issue.
Added as issue.
A class of its own would make more sense. A dataframe is simply super easy to work with, so I used that for now.

I see what I can do in my extra time to work on this, you will see how it progresses on this branch anyway.

Best, Thomas

PS: the SPA method I wrote is the Python translation of an old MATLAB script that I was using for THEMIS, which has originally been written by Glen Peters as the implementation of the pseudo-algorithm presented in Peters and Hertwich (2007), see Appendix A here https://www.tandfonline.com/doi/full/10.1080/09535310600653008 If I am not mistaken, I think Yasushi Kondo had also improved the original code for better performance.

konstantinstadler commented 6 years ago

Hi, Great. Just disagree / comment on 4: A test case could be done now already. Whatever we end up with in 4, there must be a way to extract the numerical values - for the beginning just compare the most prominent path and some values therein with previous results. It sounds a little tedious to set this up but trust me: it makes working on it so much easier afterwards when you dont have to open a csv file to see if their are value in their but instead can just run pytest -v ...

thomasgibon commented 6 years ago

Regarding

the stressor spec: I think it would make more sense to have SPA as a method of the extension and not the core system. Than you get rid of the need to specify the extension and only need to specify the row of the extension.

I agree with the principle, but to calculate the SPA you need A and Y (attributes of IOSystem), so I think it's a little complicated to do, because you need to pass A and Y somehow every time. Or you make A and Y attributes of the Extension objects too, which is redundant. For comparison, Extension.calc_system() requires x and L (attributes of IOSystem), but it is mostly used by IOSystem.calc_system() (via ext.calc_system(x=self.x, L=self.L)). This is OK because the result of this operation is unique. For the SPA, you want to be able to pass a single sector, or region, or the whole final demand. So, either

SPA is an IOSystem method, and you need to specify which extension/stressor you want to look into,

or

SPA is an Extension method, and you need to pass A and Y (which can be different, and also becomes redundant since you already specify sector and region). I think the first option is better for users.

thomasgibon commented 6 years ago

The SPA test seems to have passed!

➜  pymrio git:(master) pytest -v
============================================================================================= test session starts =============================================================================================
platform linux -- Python 3.6.4, pytest-3.4.1, py-1.5.2, pluggy-0.6.0 -- /home/thomas/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/thomas/pymrio, inifile:
plugins: remotedata-0.2.0, openfiles-0.2.0, doctestplus-0.1.2, arraydiff-0.2
collected 36 items

(...)
pymrio/tests/test_math.py::test_SPA PASSED                                                                                                                                                              [ 88%]
(...)

========================================================================================== 36 passed in 2.25 seconds ==========================================================================================

konstantinstadler commented 6 years ago

Regarding 5) above. Thats right - fully agree. So you would need to pass a string identifying the extension. You can than check if the string is in mrio.get_extensions() or in any of the extensions.name - this should cover both cases available for specifying an extension.

thomasgibon / pymrio

Test SPA #1