iiasa / climate-assessment

https://climate-assessment.readthedocs.io/en/latest
MIT License
19 stars 18 forks source link

Update and test example notebooks #6

Closed znicholls closed 2 years ago

znicholls commented 2 years ago

Adds new example notebooks for all climate models we currently support and adds infrastructure for testing the notebooks. Note that this means we're effectively running the WG3 reproduction test twice. I would suggest removing that duplication in a follow up PR.

Depends on:

Supercedes #2

znicholls commented 2 years ago

8 to be addressed in future

znicholls commented 2 years ago

@phackstock and @jkikstra any objections to merging?

phackstock commented 2 years ago

I just took a quick look at the files changed and I was wondering why you we need .github/workflows/get-infiller-database.py. Is that not covered by the confest fixture version of downloading the required files?

znicholls commented 2 years ago

I was trying to work out if I could do a user like workflow (assuming that users don’t know and don’t want to know anything about tests and fixtures). I couldn’t work out how to just call the fixture without running the tests so I just duplicated the functionality. My thinking is that we would remove the WG3 tests in favour of the notebooks in a follow up PR because they test the same thing. As a result, the duplication should only be temporary hence acceptable?

On Fri, 17 Jun 2022 at 10:10 am, Philip Hackstock @.***> wrote:

I just took a quick look at the files changed and I was wondering why you we need .github/workflows/get-infiller-database.py. Is that not covered by the confest fixture version of downloading the required files?

— Reply to this email directly, view it on GitHub https://github.com/iiasa/climate-assessment/pull/6#issuecomment-1158615466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFUH5G7662KHJZBWR6SWA6LVPQXIVANCNFSM5YKDBIHA . You are receiving this because you authored the thread.Message ID: @.***>

phackstock commented 2 years ago

@znicholls not sure I can follow your explanation. Would the purpose of this be that the users can run the tests locally? If so that mechanism already exists and is explained in the skip reason. In order to not skip the user simply need to download this file from the Scenario Explorer and place it in the correct folder. The way that .github/workflows/get-infiller-database.py would currently work is that it would crash because the environment variables SCENARIO_EXPLORER_USER and SCENARIO_EXPLORER_PASSWORD would need to be set in order to enable the download. In the conftest this is deliberately not the case as simply downloading the file and placing it in the correct spot causes the tests to be executed. If you would really want to include get-infiller-database.py I think a better spot would be within the core package as .github/workflows should be reserved for yaml files that define workflows.

znicholls commented 2 years ago

@znicholls not sure I can follow your explanation

Oops, let me try again.

So, the overall idea is that we can run the notebooks as part of the CI routine. They test reproduction of what is in the WG3 database, but just in a way which is easier for new users, understand and explore.

In order to reproduce the WG3 database, we need to download the infiller database. So the question is, where to put that script? My thinking was:

Does that help?

phackstock commented 2 years ago

@znicholls thanks for the explanation, now it get it.

Not sure I am the biggest fan though of using juptyer notebooks for this dual purpose. I really like them for introducing users to the concepts of each model and how to run them. For this they're doing a great job here. However, any changes to a notebook like this, who's primary purpose is playing around with and getting a feeling for the software package could break the tests. Additionally, it the notebooks there are many things that are done for illustrative purposes like printing out intermediate results or plotting. To me a test should be as single focus as possible. In my opinion, any changes introduced to a test should only be done to fix a test in case any code changes deliberate broke it. In short I would vote for keeping the tests as is, keeping the notebooks as is and reverting the github action workflows to the way they were before. What do you think? Also @jkikstra what's your take on this?

znicholls commented 2 years ago

Thanks for the thoughts, very helpful and good to understand. Let me start by saying that I’m happy to keep the WG3 reproduction tests as they are (even if they duplicate the same test in principle, makes things clearer conceptually?).

tl;dr I think it’s great to demonstrate how to reproduce the database results in a notebook (rather than only in testing code which users might not find) and if we’re going to claim that the notebook reproduces the results, we should test it.

—-

Re users changing notebooks and breaking the tests, that’s ok no? We would never merge such changes to the notebooks back in so if the CI is broken, that doesn’t matter? Or do I miss something here?

I think about notebooks as documentation, and I’m quite a big fan of doctests. So I think it’s important to test the notebooks, irrespective of what they do (nothing frustrates me more than trying out an example only to find it’s broken).

Then the question is what should the notebooks demonstrate. In this PR I try to demonstrate two things in the notebooks: 1) how to reproduce the WG3 database and 2) how to run your own custom scenario.

For the first application, I think it’s a great thing to have. If someone can pull the notebook off the shelf and reproduce the database results, that’s a great starting point. If we’re going to make a claim that our notebook can do such a reproduction, it’s important to make sure that the claim is true. So that’s why I’d vote to a) keep it and b) test it.

For the second application, I also think it’s a great thing to have. Users can see how they’d run their own scenario (and what format their data needs to have). I haven’t tested it because I didn’t want to preference one climate mode over any other (but that might be an overly careful approach). We could test it by just making sure it runs (or even making sure that the output it produces is stable if we wanted, but that may add confusing stuff for users so might be best avoided).

I would be very uncomfortable keeping things as they are for two reasons. The first is that nothing is tested, so we could easily break our notebooks without realising (never a good look). The second is that only FaIR is demonstrated. We should have examples of how to run all three climate models.

Hope that helps

On Mon, 20 Jun 2022 at 10:35 am, Philip Hackstock @.***> wrote:

@znicholls https://github.com/znicholls thanks for the explanation, now it get it.

Not sure I am the biggest fan though of using juptyer notebooks for this dual purpose. I really like them for introducing users to the concepts of each model and how to run them. For this they're doing a great job here. However, any changes to a notebook like this, who's primary purpose is playing around with and getting a feeling for the software package could break the tests. Additionally, it the notebooks there are many things that are done for illustrative purposes like printing out intermediate results or plotting. To me a test should be as single focus as possible. In my opinion, any changes introduced to a test should only be done to fix a test in case any code changes deliberate broke it. In short I would vote for keeping the tests as is, keeping the notebooks as is and reverting the github action workflows to the way they were before. What do you think? Also @jkikstra https://github.com/jkikstra what's your take on this?

— Reply to this email directly, view it on GitHub https://github.com/iiasa/climate-assessment/pull/6#issuecomment-1160143889, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFUH5G5VUCUUW7DLXCUDSJDVQAUOHANCNFSM5YKDBIHA . You are receiving this because you were mentioned.Message ID: @.***>

phackstock commented 2 years ago

@znicholls thanks for the detailed response. I see your points and I think we agree on all the important ones. Illustration of how to replicate and customize running the climate assessment pipeline is important and as part of that testing that the results line up is important as well. The only thing where I'm slightly hesitant about is having it as part of a GitHub action, although I agree that with your point that examples that we show must work and the best way of ensuring that is to test them directly and not build some type of mock-up and test that. So ultimately I'm with you on that as well.

The only thing I'd request as a change is to move the get_infiller_download_link function into the utils module. There should really only be GitHub action yaml files in the .github/workflows folder and this way the test can also import it from the utils so we have less code duplication.

znicholls commented 2 years ago

The only thing I'd request as a change is to move the get_infiller_download_link function into the utils module. There should really only be GitHub action yaml files in the .github/workflows folder and this way the test can also import it from the utils so we have less code duplication.

Ok will do (maybe tonight, maybe Thursday after scenario forum), thanks for the discussion!

znicholls commented 2 years ago

This will fail until pyam database is back up

znicholls commented 2 years ago

Great thanks

On Wed, 22 Jun 2022 at 2:55 pm, Philip Hackstock @.***> wrote:

@.**** approved this pull request.

Looks good to me, thanks for all the updates @znicholls https://github.com/znicholls. Will merge as soon as the tests run through if that's ok with you.

— Reply to this email directly, view it on GitHub https://github.com/iiasa/climate-assessment/pull/6#pullrequestreview-1015123628, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFUH5GYD7ZZ2LEK5QJ2DSBDVQMELRANCNFSM5YKDBIHA . You are receiving this because you were mentioned.Message ID: @.***>