iiasa / message-ix-models

Tools for the MESSAGEix-GLOBIOM family of models
https://docs.messageix.org/models
Apache License 2.0
16 stars 31 forks source link

Update the nexus module #106

Open awais307 opened 1 year ago

awais307 commented 1 year ago

This PR addresses issue#93 and updates data to the nexus module, addresses the tests issue

How to review

TBD

PR checklist

TBD

glatterf42 commented 1 year ago

Currently, the linting test is failing (among others). Please see our guide for contributing code to see what code style we expect. Please also note that black, flake8, mypy, and isort can all be configured to be applied automatically whenever you save your code (the details of which can vary, but we can work it out together if you want to). This helps passing theses tests tremendously.

glatterf42 commented 1 year ago

The current issue with the other tests seems to be that they still rely on code in message_data (which they shouldn't, as explained before, since all code here should be open source):

message_ix_models/tests/model/water/test_nexus.py:5: in from message_data.model.water import build E ModuleNotFoundError: No module named 'message_data'

glatterf42 commented 1 year ago

Also to make sure: how large are these data files that you are updating here? If they are too large (and there are lots of them), it might be better to upload them to Zenodo or so (see the second point in the comment by @khaeru).

khaeru commented 1 year ago
glatterf42 commented 9 months ago

The tests are mostly working, despite what it looks like here. There are some general issues however that we need to address to fully see their potential:

adrivinca commented 7 months ago

@awais307 can you please update on what your commits did, if anything is still missing and how you expect others to review? thanks

glatterf42 commented 7 months ago

@awais307, @adrivinca: the tests are almost working, you're on a good path there! Almost all errors seem to originate from the same problem: that you are not using a 'real' Context object. As pointed out above, using those is preferable since it makes the tests easier to understand and maintain and since you won't have to rewrite your actual code to deal with both Contexts and dicts. With the latest commit, I adapted the test in tests/model/water/test_cooling.py to show you what I mean by that. We do have an object called test_context and we can use it just like that our test functions. Please take a look and try to adapt your other tests in a similar way, I suspect it will solve most of the remaining errors :) If you don't want to write the same setup of the text_context all the time, you could introduce another fixture like it e.g. in the tests/model/water/__init__.py: you can define a function that takes the test_context, does whatever setup all your tests frequently share, and return the modified test_context. You can then use the pytest decorator @pytest.fixture(scope="function") to make this function work identical to test_context. Then, you can use it in all your other tests as well :)

One test error might remain, though, and that is this one:

___________________________________________ test_map_yv_ya_lt ___________________________________________

    def test_map_yv_ya_lt():
        periods = (2010, 2020, 2030, 2040)
        lt = 20
        ya = 2020

        expected = pd.DataFrame(
            {"year_vtg": [2010, 2020, 2020, 2030], "year_act": [2020, 2020, 2030, 2040]}
        )

        result = map_yv_ya_lt(periods, lt, ya)

>       pd.testing.assert_frame_equal(result, expected)
E       AssertionError: DataFrame are different
E       
E       DataFrame shape mismatch
E       [left]:  (8, 2)
E       [right]: (4, 2)

message_ix_models/tests/model/water/test_utils.py:124: AssertionError

The shape of the compared DataFrames won't be fixed by the different context, I assume. Please check that expected is really what we expect, e.g. year_vtg contains 2020 twice, is that on purpose?

One final note: I removed a few markers that told rust to ignore functions being too complex. Please take another look at those and try to reduce their complexity! In most cases, it is enough to take a part of the function that includes a lot of if ... elif ... else ... and move that to a new function that takes some input data and returns different output depending on the condition.

Please let me know if you need help with any of the above :)

glatterf42 commented 4 months ago

Hi @adrivinca and @awais307, I've taken another look at this PR. I started by addressing all code quality issues: applying some formatting and reducing code complexity. The latter worked quite well in parts because you already had all the puzzle pieces present: e.g. in water_for_ppl.py, your immensely complex cool_tech() function was in part only complex because you defined various helper functions inside that same function. So by moving the definitions outside, the code quickly became less complex. Unfortunately, the same approach did not work as well for infrastructure.py. While moving stuff outside the main function also resolved the issue, there's still much room for improvements. Almost all steps in the main function that work with the input DataFrame seem to be very similar except for tiny variations, so according to the DRY principle (Don't Repeat Yourself), this code is most likely generalisable.

Next, I took a look at the failing tests. For some of them, I was able to fix them myself, though even in those cases, there are some open questions. Please check e.g. the TODOs in test_map_yv_ya_lt(). For some cases, trying to use the test_context or otherwise fixing the tests led to problems to such a degree that I didn't know how to fix them. Please search for FIXME to find descriptions of these errors with probable causes and determine how you want to fix them.

Finally, your reporting.py contains these lines: https://github.com/iiasa/message-ix-models/blob/abcd6144752a742e9191cd4d51be3bd47de020e4/message_ix_models/model/water/reporting.py#L14-L36 run_old_reporting() is part of your main workflow (via report_full()), so legacy_reporting is actually required to be there. My version of mypy is already marking this as an error because legacy_reporting really cannot be None, so your code still requires message_data to be present, although no code in this repo should. Please find a solution for that, e.g. by not using the legacy_reporting at all and instead using the reporting system that message_ix_models provides. At the very least, you should add a clause/flag to report_full so that run_old_reporting() is only run if message_data is present. For this, you could reuse the HAS_MESSAGE_DATA flag that already exists in the code: https://github.com/iiasa/message-ix-models/blob/abcd6144752a742e9191cd4d51be3bd47de020e4/message_ix_models/util/common.py#L21

adrivinca commented 3 months ago

Thanks a lot @glatterf42 for spending time on this and for the thorough overview. I am going through the tests and the code, here the main changes I did

Some FIXME are related to making a more general configuration for the module, and also maybe a testing config. I would like to work on it for the SSP update, where I need anyway to change the code and for instance add the SSP config. Therefore, I left some FIXME for the future, and I would merge them with the FIXME memo for now I refer to the comments in test_irrigation, test_water_supply.

Other points are also open questions to me and I would like @awais307 to please intervene and fix. in test_water_supply:

currently, the function is checking to match this df

   Region  value  year  time  Unnamed: 0  BCU_name
0  test_Region      1  2020  year           0  test_BCU

and df_sw in the function read_water_availability() before it breaks

     Region     years        value
0  test_BCU    Region  test_Region
1  test_BCU     value            1
2  test_BCU      year         2020
3  test_BCU      time         year
4  test_BCU  BCU_name     test_BCU

I think these functions need to be tested with actual existing data

unfortunately I am not familiar with the with patch and mocking approach @awais307 have you tried those tests you wrote? When running pytest in the folder the al break please also have a look to the annotation issue in test_utils.test_add_commodity_and_level(). and test_utils.test_read_config

glatterf42 commented 3 months ago

@awais307 I rebased this PR on top of the latest main branch to keep the linear commit history and used this opportunity to make two functions less complex. You should now be able to use the latest commit here, make changes and merge to main without 'out-of-date' issues; please let me know if you can't.

glatterf42 commented 3 months ago

@awais307, I'm trying to keep the commit history linear by using git rebase main. A linear history means it will be very easy to understand how and why which changes were introduced and the changes will integrate with the rest of the code base neatly. So if you see that this branch is out-of-date with the main branch, I'd appreciate if you could also use git rebase instead of git merge :) If you don't feel comfortable using git rebase, please check out e.g. this tutorial.

awais307 commented 2 months ago

@awais307, I'm trying to keep the commit history linear by using git rebase main. A linear history means it will be very easy to understand how and why which changes were introduced and the changes will integrate with the rest of the code base neatly. So if you see that this branch is out-of-date with the main branch, I'd appreciate if you could also use git rebase instead of git merge :) If you don't feel comfortable using git rebase, please check out e.g. this tutorial.

okay, for some reason, my local tries to merge it whenever I fetch the changes.

glatterf42 commented 2 months ago

@awais307, I'm trying to keep the commit history linear by using git rebase main. A linear history means it will be very easy to understand how and why which changes were introduced and the changes will integrate with the rest of the code base neatly. So if you see that this branch is out-of-date with the main branch, I'd appreciate if you could also use git rebase instead of git merge :) If you don't feel comfortable using git rebase, please check out e.g. this tutorial.

okay, for some reason, my local tries to merge it whenever I fetch the changes.

Could you please try to configure your local git interaction not to do that, then? If you tell me how you interact with git locally, I might be able to help you. Do you use Github Desktop? GitKraken? The command line?

To get the changes from up here to your system and overwrite the changes you have locally, you might need to perform something like a "force pull", which you can do following these instructions. Please let me know if you need any help with that :)

glatterf42 commented 2 months ago

Hi @awais307 :) I took a closer look at the failing tests today and most of them should be fixed. Currently, I just see one local failure, which comes from the test_build.py::test_build which you recently added. Do you have any kind of confirmation that the water module functions should work with the bare_res scenario?

The test currently fails in the function cool_tech, which is called as part of model/water/data/__init__.py::add_data(), which in turn is part of apply_spec() as called in model/water/build.py::main(). The code complains that you can replace df[[label1, label2]] with something applied to df[[label1, label2, label3]]. If I adapt the lists of labels to fit, however, I run into another error a few lines later which states that we are trying to fill None values in cooling_df with values in ref_input["technology"], but these values are all None themselves. So I suspect something is going wrong at a deeper level.

I also suspect that adding a test for the function cool_tech() will shine a light onto why this other test is failing.

In any case, I hope the examples for how to use the test_context that are now included in the test suite help you with figuring out how to use it for testing further cases :)

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 51.39785% with 226 lines in your changes missing coverage. Please review.

Project coverage is 78.2%. Comparing base (c6285cf) to head (44fc0c1).

:exclamation: Current head 44fc0c1 differs from pull request most recent head 1a8a311

Please upload reports for the commit 1a8a311 to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #106 +/- ## ======================================== + Coverage 52.1% 78.2% +26.0% ======================================== Files 141 117 -24 Lines 11335 7310 -4025 ======================================== - Hits 5914 5717 -197 + Misses 5421 1593 -3828 ``` | [Files](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa) | Coverage Δ | | |---|---|---| | [message\_ix\_models/model/water/\_\_init\_\_.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Fmodel%2Fwater%2F__init__.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvbW9kZWwvd2F0ZXIvX19pbml0X18ucHk=) | `100.0% <100.0%> (ø)` | | | [message\_ix\_models/model/water/build.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Fmodel%2Fwater%2Fbuild.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvbW9kZWwvd2F0ZXIvYnVpbGQucHk=) | `91.4% <100.0%> (+76.4%)` | :arrow_up: | | [message\_ix\_models/model/water/data/\_\_init\_\_.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Fmodel%2Fwater%2Fdata%2F__init__.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvbW9kZWwvd2F0ZXIvZGF0YS9fX2luaXRfXy5weQ==) | `95.0% <100.0%> (+31.8%)` | :arrow_up: | | [message\_ix\_models/model/water/data/irrigation.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Fmodel%2Fwater%2Fdata%2Firrigation.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvbW9kZWwvd2F0ZXIvZGF0YS9pcnJpZ2F0aW9uLnB5) | `100.0% <100.0%> (+85.1%)` | :arrow_up: | | [...essage\_ix\_models/tests/model/water/test\_cooling.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Ftests%2Fmodel%2Fwater%2Ftest_cooling.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvdGVzdHMvbW9kZWwvd2F0ZXIvdGVzdF9jb29saW5nLnB5) | `100.0% <100.0%> (ø)` | | | [...age\_ix\_models/tests/model/water/test\_irrigation.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Ftests%2Fmodel%2Fwater%2Ftest_irrigation.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvdGVzdHMvbW9kZWwvd2F0ZXIvdGVzdF9pcnJpZ2F0aW9uLnB5) | `100.0% <100.0%> (ø)` | | | [message\_ix\_models/tests/model/water/test\_utils.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Ftests%2Fmodel%2Fwater%2Ftest_utils.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvdGVzdHMvbW9kZWwvd2F0ZXIvdGVzdF91dGlscy5weQ==) | `100.0% <100.0%> (ø)` | | | [message\_ix\_models/util/scenarioinfo.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Futil%2Fscenarioinfo.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvdXRpbC9zY2VuYXJpb2luZm8ucHk=) | `100.0% <ø> (ø)` | | | [message\_ix\_models/model/water/cli.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Fmodel%2Fwater%2Fcli.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvbW9kZWwvd2F0ZXIvY2xpLnB5) | `33.3% <85.7%> (+0.6%)` | :arrow_up: | | [message\_ix\_models/tests/model/water/test\_build.py](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree&filepath=message_ix_models%2Ftests%2Fmodel%2Fwater%2Ftest_build.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa#diff-bWVzc2FnZV9peF9tb2RlbHMvdGVzdHMvbW9kZWwvd2F0ZXIvdGVzdF9idWlsZC5weQ==) | `96.9% <96.9%> (ø)` | | | ... and [7 more](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa) | | ... and [50 files with indirect coverage changes](https://app.codecov.io/gh/iiasa/message-ix-models/pull/106/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iiasa)
glatterf42 commented 2 months ago

I've added several new (superficial) tests that should increase the test coverage enough to make this PR mergeable :) However, there are two main notes:

  1. By superficial I mean that the tests don't check any actual resulting values, they just check that the results have roughly the correct form (e.g. column names are matching). This could be improved to actually make the tests reliable.
  2. Some of the new tests are not passing. I've marked them with FIXME and added notes about what I think is going wrong. Often, it's a question of "we want to use a Scenario to e.g. run the nexus reporting on, but for this we first have to create a Scenario properly, so that e.g. all cooling technologies can be found in the input parameter". I expect that you are much more familiar with how to do that, so I'd leave that up to you, @awais307 and @adrivinca.
awais307 commented 2 months ago

I've added several new (superficial) tests that should increase the test coverage enough to make this PR mergeable :) However, there are two main notes:

  1. By superficial I mean that the tests don't check any actual resulting values, they just check that the results have roughly the correct form (e.g. column names are matching). This could be improved to actually make the tests reliable.
  2. Some of the new tests are not passing. I've marked them with FIXME and added notes about what I think is going wrong. Often, it's a question of "we want to use a Scenario to e.g. run the nexus reporting on, but for this we first have to create a Scenario properly, so that e.g. all cooling technologies can be found in the input parameter". I expect that you are much more familiar with how to do that, so I'd leave that up to you, @awais307 and @adrivinca.

Thanks a lot @glatterf42 . Very much appreciated for the contribution and support. Per the tests in demands, I think perhaps @adrivinca can confirm better since he did most of the recent adjustment with Zambia model. In principle, I would assume there is no need to read target rate function for a country case since those are meant for global basins.

adrivinca commented 1 month ago

one of the problems that Fridolin mentioned has to do with this function

def get_basin_sizes(basin: pd.DataFrame, node: str) -> Tuple[float, float]:
    """Returns the sizes of developing and developed basins for a given node"""
    temp = basin[basin["BCU_name"] == node]
    print(temp)
    sizes = temp.pivot_table(index=["STATUS"], aggfunc="size")
    print(sizes)
    return sizes["DEV"], sizes["IND"]

I solved it by defining the sizes =0 in case they are empty before outputting. Also the test function needed a different "SDG" in the case of the ZMB model, which I changed.

The other issue on test_wateR_for_ppl::test_cool_tec shows multiple issues that I adjusted:

glatterf42 commented 2 weeks ago

Hi @adrivinca, thanks for taking another look at this :)

I've tried addressing some of the issues here. As you noticed, there were some test failures that only occurred for tests that use ixmp and message_ix v3.6.0 (and older), so I've marked the function as needing a minimum version of 3.7.0. For 3.6.0 and older, our test suite forces pandas < 2.0.0, but one function in your tests uses a parameter introduced with pandas 2.1.0. This is totally fine, and if it happens again (or I forgot a function calling add_sectoral_demand), please apply the same changes that are now in https://github.com/iiasa/message-ix-models/pull/106/commits/feafa7edfd47d83eac78a15ba98263ef42686ef8.

Next, I noticed that you removed the updated assignment in model/data/water_for_ppl.py:256 that I previously changed. Without my changes, I see a pandas error because

    ref_input[["value", "level"]] = ref_input[["technology", "value", "level"]].apply(
        missing_tech, axis=1
    )

tried to assign three colums of a dataframe to two columns (of the same dataframe, which makes me think there should be a better way to simply update these two columns in the first place). This does not work because the number of columns on both sides of the assignment needs to match. So, assuming the dataframe you want to apply your missing_tech function to requires the technology column and you don't want to assign this column in ref_input again, I've now changed it to this instead:

    ref_input[["value", "level"]] = ref_input[["technology", "value", "level"]].apply(
        missing_tech, axis=1
    )[["value", "level"]]

Selecting the columns we want to assign after they have been re-computed. For this to work, I had to also adapt missing_tech to return a pandas Series with correctly named columns.

With this, test_build still runs into an error: NaN values can't be converted to int. This seems to be the result of almost all dataframes in cool_tech being empty in this test, in particular ref_input in line 264. I'm not sure why this is the case and if this is supposed to work. So please clarify if cool_tech should be able to handle this case of empty dataframes, or if this is by mistake and these dataframes should never be empty. In the latter case, you could consider introducing a quick sanity check, either with try...except... to raise an error message or a warning if the dataframe is empty or with assert ... to ensure the dataframe is not empty (and raise an AssertionError otherwise).

Finally, we still have the error in test_report_full, which arises from the call to legacy_report. From my adaptation of the legacy reporting to message-ix-models, I remember that this error most likely comes from the fact that the default reporting is trying to calculate tables (the default tables, presumably), that you don't actually want to compute. Please take a look at your local reporting configuration and ensure this branch is computing the same tables. If this doesn't help, @OFR-IIASA is the expert on message_data-style reporting and can probably quickly point you to what's wrong if you ask nicely :pray:

glatterf42 commented 2 weeks ago

For completeness' sake: there is an error about a dimensionless unit. This is completely unrelated to this PR and will be fixed upstream in pint at some point. Please disregard for this PR :)