openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
701 stars 36 forks source link

[REVIEW]: BoARIO: A Python package implementing the ARIO indirect economic cost model #6547

Closed editorialbot closed 2 months ago

editorialbot commented 5 months ago

Submitting author: !--author-handle-->@spjuhel<!--end-author-handle-- (Samuel Juhel) Repository: https://github.com/spjuhel/BoARIO Branch with paper.md (empty if default branch): main Version: v0.5.10 Editor: !--editor-->@crvernon<!--end-editor-- Reviewers: @mwt, @potterzot Archive: 10.5281/zenodo.11580697

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/71386aa01a292ecff8bafe273b077701"><img src="https://joss.theoj.org/papers/71386aa01a292ecff8bafe273b077701/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/71386aa01a292ecff8bafe273b077701/status.svg)](https://joss.theoj.org/papers/71386aa01a292ecff8bafe273b077701)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mwt & @potterzot, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @crvernon know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @mwt

📝 Checklist for @potterzot

editorialbot commented 5 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 5 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1111/jiec.12715 is OK
- 10.1021/es300171x is OK
- 10.2139/ssrn.4101276 is OK
- 10.1093/reep/rez004 is OK
- 10.5334/jors.251 is OK
- 10.1029/2020ef001616 is OK
- 10.5281/zenodo.8383171 is OK
- 10.1038/s41562-020-0896-8 is OK
- 10.1007/s10584-010-9979-2 is OK
- 10.1016/j.jedc.2011.10.001 is OK
- 10.1007/s10584-010-9978-3 is OK
- 10.1111/j.1539-6924.2008.01046.x is OK
- 10.1111/risa.12090 is OK
- 10.1007/s12665-011-1078-9 is OK
- 10.1007/s11069-013-0788-6 is OK
- 10.1111/risa.12300 is OK
- 10.1029/2018ef000839 is OK
- 10.1038/s41893-020-00646-7 is OK
- 10.1080/19475705.2018.1489312 is OK
- 10.31223/x5qd6b is OK
- 10.1038/s41558-018-0173-2 is OK
- 10.1088/1748-9326/ab3306 is OK
- 10.1093/bioinformatics/bts480 is OK
- 10.2139/ssrn.3285818 is OK

MISSING DOIs

- No DOI given, and none found for title: OECD Inter-Country Input-Output Database
- No DOI given, and none found for title: NumPy

INVALID DOIs

- None
editorialbot commented 5 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.14 s (869.9 files/s, 280925.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            30           5288             94          18663
Python                          19            866           1263           4383
SVG                              7              0              0           1718
CSS                             11            237             86           1183
JavaScript                      12            161            246           1026
TeX                              4             51              4            675
reStructuredText                21            609            646            657
Markdown                         5             48              0            196
YAML                             5             11             20             99
TOML                             2             11              2             88
JSON                             1              0              0             36
DOS Batch                        1              8              1             26
make                             1              5              7             16
-------------------------------------------------------------------------------
SUM:                           119           7295           2369          28766
-------------------------------------------------------------------------------

Commit count by author:

   391  Samuel Juhel
     7  sjuhel
     2  Alessio Ciullo
editorialbot commented 5 months ago

Paper file info:

📄 Wordcount for paper.md is 1111

✅ The paper includes a Statement of need section

editorialbot commented 5 months ago

License info:

🟡 License found: GNU General Public License v3.0 (Check here for OSI approval)

mwt commented 5 months ago

Review checklist for @mwt

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

editorialbot commented 5 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

potterzot commented 5 months ago

Review checklist for @potterzot

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

crvernon commented 5 months ago

👋 @spjuhel, @mwt, and @potterzot - This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above.

Both reviewers have checklists at the top of this thread (in that first comment) with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/6547 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

crvernon commented 5 months ago

👋 @spjuhel, @mwt, and @potterzot - Just checking in to see how the review is going. Let me know if you have any questions!

mwt commented 4 months ago

@spjuhel Hi, I am going through this right now and have identified some issues.

  1. I do not think that the paper explains the state of the field. That is, it does not outline what other software, if any, exists to solve this problem. Otherwise, I think the paper is good.
  2. The documentation is incomplete. For example, EventKapitalRebuild.from_scalar_regions_sectors(), which is used in the example, is undocumented outside of the automatically generated api reference. It seems to be similar to EventKapitalRebuild.from_series(), but it is also not a drop-in replacement.
  3. Some documentation appears to be wrong or confusing. For example, the description of impact_sectoral_distrib says

    A vector of equal size to the list of sectors affected, stating the share of the impact each industry should receive. Defaults to None.

    but the input given in the example is the string "gdp", which is not a share, and a list of actual shares like [0.5, 0.5] results in a KeyError.

For the second point, I think it would be ideal if the quickstart guide contained all of the information that was needed to understand and execute the quickstart example. This means that we should get a rundown of the events api in the quickstart guide with just enough information to do the example. The rest of the docs can go into more details about the options and alternative ways to specify events.

Smaller points:

spjuhel commented 4 months ago

@mwt Hi, thank you a lot for your remarks,

Indeed, the documentation suffered from several problems.

I did some redesigning, and it should be much better now (Both API and User guide), no huge change for the User guide, but some restructuring. The rework on the API focused on the event module to bring more clarity to the API, and I also switched to a less cluttered and more tree oriented documentation.

In parallel, I should have addressed all your specific points regarding the documentation. See below for more details if you want.

I will push these as a minor patch on the main branch today or tomorrow, alongside an addition in the paper.md regarding the state of the field. Am I correct in thinking this should go into the State of need section?


  1. The documentation is incomplete. For example, EventKapitalRebuild.from_scalar_regions_sectors(), which is used in the example, is undocumented outside of the automatically generated api reference. It seems to be similar to EventKapitalRebuild.from_series(), but it is also not a drop-in replacement.

I think this was due to the function being defined only for the Event main class, the documentation now shows in the subclasses as well, and I have added a lot of information on each subclass specificities notably around the differences in instantiation. Hopefully this is clearer now.

  1. Some documentation appears to be wrong or confusing. For example, the description of impact_sectoral_distrib says

    A vector of equal size to the list of sectors affected, stating the share of the impact each industry should receive. Defaults to None.

but the input given in the example is the string "gdp", which is not a share, and a list of actual shares like [0.5, 0.5] results in a KeyError.

I fixed the documentation. I did not have a KeyError on my side, it is possible this was fixed at some point, but I'm unsure... Could you share a minimal example and the version used?

  1. In the quickstart example, it is not necessary to import pandas for the plots because it is already loaded inside your package.

Yes indeed! I removed this.

  1. I believe that the sector key manufactoring is supposed to be manufacturing, but I may be wrong.

Sadly this is a typo from the pymrio package, I will probably suggest a PR on their repo at some point.

  1. Automatically generated docs are not formatted correctly for arguments. It would be good if the function arguments were nicely formatted instead of running together. This "args" section is also not being detected as the same as the "parameters" section, and so the inputs are enumerated twice.

Yes, I think this came from automated docstring generation I tested at some point and I did not realize it was not in the same format. This should be fixed in the minor patch.

spjuhel commented 4 months ago

@editorialbot generate pdf

editorialbot commented 4 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

crvernon commented 4 months ago

👋 @spjuhel, @mwt, and @potterzot - Just checking in to see how the review is going. Could you provide a brief update to your status here in this thread?

mwt commented 4 months ago

@spjuhel, this is a big improvement. I really like the new documentation.

I was running version 0.5.7 and am now on 0.5.9. I no longer get a key error, but the function still fails to converge when I use numerical shares instead of gdp. Perhaps this was fixed though in a version that isn't yet on pypi? Here is my example. For me, there are NaN for every step after 1.

# import pymrio for the test MRIO
import pymrio

# import the different classes
from boario.simulation import Simulation  # Simulation wraps the model
from boario.extended_models import ARIOPsiModel  # The core of the model

from boario.event import EventKapitalRebuild  # A class defining a shock on capital

# Load the IOSystem from pymrio
mrio = pymrio.load_test().calc_all()

# Instantiate the model and the simulation
model = ARIOPsiModel(mrio)
sim = Simulation(model,n_temporal_units_to_sim=730)

# Instantiate an event.
ev = EventKapitalRebuild.from_scalar_regions_sectors(
  impact=500000,
  regions=["reg1"],
  sectors=["manufactoring", "mining"],
  impact_sectoral_distrib = [0.5, 0.5],
  rebuilding_sectors={"construction": 0.55,"manufactoring": 0.45},
  rebuilding_factor=1.0,
  rebuild_tau=90,
)

# Add the event to the simulation
sim.add_event(ev)

# Launch the simulation
sim.loop(progress=False)

# You should be able to generate a dataframe of
# the production with the following line
df = sim.production_realised
# This allows to normalize production at its initial level
df = df / df.loc[0]

df.loc[:, ("reg1", slice(None))].plot()

I think you addressed everything else. The "See Also" links in the function documentation that lead to tutorials is a nice touch.

I ran your tests. When I use test_events.py, I get the following error:

test_events.py::test_EventKapitalRebuild_incorrect_impact[empty_np]
 boario/event.py:497: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
    if impact <= 0:

I think you should raise an error in this case? That would cause the test to fail. Do you know why the impact has size zero in this test case?

mwt commented 4 months ago

@crvernon said:

Just checking in to see how the review is going. Could you provide a brief update to your status here in this thread?

I'm debugging. My current thought is that the functions were designed to work with a particular set of inputs and is very well-tested with those inputs. I'm trying to understand if there are other inputs and edge cases where it fails.

spjuhel commented 4 months ago

@mwt Thanks for noticing these issues,

For the example, I realize that the shock is just too great on the mining sector with this setup, and that make the model "crash", but thus it should at least raise a proper warning. And a functioning example would be more appropriate.

I think you should raise an error in this case? That would cause the test to fail. Do you know why the impact has size zero in this test case?

I realize the test/code is not very well-designed, I don't have a clear method for instantiating from a numpy array, and in the test this case (empty_np, where input is an empty numpy array) goes to the from_scalar_regions_sectors() which assumes the impact is scalar, hence the warning. The test does raise an error afterwards, as it should, but that could be made more proper and clearer indeed.

I'm trying to understand if there are other inputs and edge cases where it fails.

That would be so helpful, I did look out for these, but testing is not my forte and there's a lot of room for improvements here.


I actually plan (long term) to highly simplify the event module, such that:

But these require to redesign technical parts of all modules, so I'm not sure if they should be considered in this review? (Any remarks/suggestions on this are most welcome!!)

potterzot commented 4 months ago

👋 @spjuhel, @mwt, and @potterzot - Just checking in to see how the review is going. Could you provide a brief update to your status here in this thread?

Hello, I haven't gotten far beyond cloning and installing the package but should have time this Friday (May 3).

crvernon commented 3 months ago

@potterzot just following up to see when you will have time to start your review? Thanks!

potterzot commented 3 months ago

@crvernon my apologies, I will have it posted by Tuesday evening.

crvernon commented 3 months ago

Thanks @potterzot!

potterzot commented 3 months ago

I was able to install and reproduce the example models as well as successfully run tests. No issues there. I do have several suggestions on the documentation and for the paper. I've also made a PR with some small grammar fixes. I've checked off all of the items in the review checklist, but I think the package would benefit from addressing some of the comments below before publication. Most of these are very minor.

This is beyond the scope of the current paper, but given the goal of the project stated in both the README and the paper, it would be useful to have more in depth case studies that demonstrate both how to conduct various real-world analysis of impacts and how to modify the base assumptions and data used in the model. I think this would substantially help with adoption of this software for use in impact modeling by other researchers and impact modelers. A few well-designed examples would both highlight how users should think about measuring impacts of different kinds of shocks as well as how to modify the model parameters to reflect changes. In addition, an example of how to take an IO table and create a pymrio object from it would allow a user who doesn't have access to a prebuild IO model but does have access to their country's make and use tables to build their own underlying model. I know I've wanted something like this to exist for quite some time. In the US there are some efforts to provide an open source IO model to "democratize" the use of impact modeling in policy advocacy (for example, see Tapestry).

There are some integrations and events that are referenced in the documentation and paper but that aren't currently working. For example, one of the three possible impact events (EventArbitraryProd) is currently not available because of a critical bug. The paper implies a fully working model that has extensive documentation and case studies, but there aren't case studies or multiple types of shocks described or linked in the documentation or the paper. The paper also states that integration to various other modeling platforms and workflows (e.g. "shock on demand, shock on production, shock on both, shock involving reconstruction or not, etc" suggests shock possibilities that are not described in the documentation. Mostly this could be addressed by toning down the promise of boario a bit. I make a few specific suggestions in the comments below.

I hope the following comments are helpful. Some of the comments are very minor. Probably they do not all need to be addressed, but I think they all would improve the overall sense of "completeness" of the software and the documentation.

Documentation comments

Paper comments

spjuhel commented 3 months ago

Thanks a lot for all these precious comments,

it would be useful to have more in depth case studies that demonstrate both how to conduct various real-world analysis of impacts and how to modify the base assumptions and data used in the model. I think this would substantially help with adoption of this software for use in impact modeling by other researchers and impact modelers. A few well-designed examples would both highlight how users should think about measuring impacts of different kinds of shocks as well as how to modify the model parameters to reflect changes.

I totally agree. I intend to extend the documentation with more detailed examples based on my papers, but these are still in the reviewing process, or even unfinished. Another thing I would like to do is to present examples reproducing previous studies / well-known disasters in the indirect impact literature (such as Hurricane Katrina or the 2001 floods in Thailand). This however requires a substantial amount of work, which I'm not at liberty to do at the moment.

In addition, an example of how to take an IO table and create a pymrio object from it would allow a user who doesn't have access to a prebuild IO model but does have access to their country's make and use tables to build their own underlying model.

I agree this would be a nice addition, but I feel like this is something that should be part of the pymrio documentation (and already does to some extent).

There are some integrations and events that are referenced in the documentation and paper but that aren't currently working. For example, one of the three possible impact events (EventArbitraryProd) is currently not available because of a critical bug. The paper implies a fully working model that has extensive documentation and case studies, but there aren't case studies or multiple types of shocks described or linked in the documentation or the paper. The paper also states that integration to various other modeling platforms and workflows (e.g. "shock on demand, shock on production, shock on both, shock involving reconstruction or not, etc" suggests shock possibilities that are not described in the documentation. Mostly this could be addressed by toning down the promise of boario a bit. I make a few specific suggestions in the comments below.

Yes, I might have been too ambitious when writing some parts of the paper. I discovered only during this review that EventArbitraryProd was not working properly. I am working on a new major version which will include a fix for this, but I also took this opportunity to completely rework the way events are handled (as mentioned before). I have rephrased the corresponding part to make it more consistent with the current implementation.

I just pushed these changed to the develop branch, but I still want to look into what @mwt raised and recheck that everything works properly in the examples before merging to main and publishing another release.

mwt commented 3 months ago

I agree this would be a nice addition, but I feel like this is something that should be part of the pymrio documentation (and already does to some extent).

I think it could still make sense to provide the steps in your own tutorial. Though, linking it directly might be better value for effort.

spjuhel commented 3 months ago

Here are my latest changes on develop:

Does that correctly answer all your comments/remarks @potterzot @mwt ?

potterzot commented 2 months ago

@spjuhel @crvernon I am happy to recommend publication at this point. My review should be checked off.

Just FYI I'll be offline from tomorrow morning until June 10th.

mwt commented 2 months ago

I approve as well!

crvernon commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

crvernon commented 2 months ago

@editorialbot check references

editorialbot commented 2 months ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@editorialbot commands

crvernon commented 2 months ago

@editorialbot check references

editorialbot commented 2 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1111/jiec.12715 is OK
- 10.1021/es300171x is OK
- 10.2139/ssrn.4101276 is OK
- 10.1093/reep/rez004 is OK
- 10.5334/jors.251 is OK
- 10.1029/2020ef001616 is OK
- 10.5281/zenodo.8383171 is OK
- 10.1038/s41562-020-0896-8 is OK
- 10.1007/s10584-010-9979-2 is OK
- 10.1016/j.jedc.2011.10.001 is OK
- 10.1007/s10584-010-9978-3 is OK
- 10.1111/j.1539-6924.2008.01046.x is OK
- 10.1111/risa.12090 is OK
- 10.1007/s12665-011-1078-9 is OK
- 10.1007/s11069-013-0788-6 is OK
- 10.1111/risa.12300 is OK
- 10.1029/2018ef000839 is OK
- 10.1038/s41893-020-00646-7 is OK
- 10.1080/19475705.2018.1489312 is OK
- 10.31223/x5qd6b is OK
- 10.1038/s41558-018-0173-2 is OK
- 10.1088/1748-9326/ab3306 is OK
- 10.1093/bioinformatics/bts480 is OK
- 10.2139/ssrn.3285818 is OK
- 10.1080/09535314.2016.1232701 is OK
- 10.1038/s41893-020-00649-4 is OK
- 10.1016/j.jedc.2017.08.001 is OK
- 10.1038/s41893-020-0523-8 is OK
- 10.1007/s10584-018-2293-0 is OK

MISSING DOIs

- No DOI given, and none found for title: OECD Inter-Country Input-Output Database
- No DOI given, and none found for title: NumPy

INVALID DOIs

- None
crvernon commented 2 months ago

👋 @spjuhel - please make the following corrections in your paper. After you do so, I will take another review pass.

spjuhel commented 2 months ago

👋 @crvernon,

I noticed I forgot to merge my develop branch into main, hence I think you reviewed a paper version slightly not up to date. Apologies for that. (Changes are minimal, though).

I should have addressed all your comments, and I did the merge on main, so the paper is now up-to-date.

crvernon commented 2 months ago

@editorialbot set main as branch

editorialbot commented 2 months ago

Done! branch is now main

crvernon commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

crvernon commented 2 months ago

@editorialbot check references

editorialbot commented 2 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1111/jiec.12715 is OK
- 10.1021/es300171x is OK
- 10.2139/ssrn.4101276 is OK
- 10.1093/reep/rez004 is OK
- 10.5334/jors.251 is OK
- 10.1029/2020ef001616 is OK
- 10.5281/zenodo.8383171 is OK
- 10.1038/s41562-020-0896-8 is OK
- 10.1007/s10584-010-9979-2 is OK
- 10.1016/j.jedc.2011.10.001 is OK
- 10.1007/s10584-010-9978-3 is OK
- 10.1111/j.1539-6924.2008.01046.x is OK
- 10.1111/risa.12090 is OK
- 10.1007/s12665-011-1078-9 is OK
- 10.1007/s11069-013-0788-6 is OK
- 10.1111/risa.12300 is OK
- 10.1029/2018ef000839 is OK
- 10.1038/s41893-020-00646-7 is OK
- 10.1080/19475705.2018.1489312 is OK
- 10.31223/x5qd6b is OK
- 10.1038/s41558-018-0173-2 is OK
- 10.1088/1748-9326/ab3306 is OK
- 10.1093/bioinformatics/bts480 is OK
- 10.2139/ssrn.3285818 is OK
- 10.1080/09535314.2016.1232701 is OK
- 10.1038/s41893-020-00649-4 is OK
- 10.1016/j.jedc.2017.08.001 is OK
- 10.1038/s41893-020-0523-8 is OK
- 10.1007/s10584-018-2293-0 is OK

MISSING DOIs

- No DOI given, and none found for title: OECD Inter-Country Input-Output Database
- No DOI given, and none found for title: NumPy

INVALID DOIs

- None
crvernon commented 2 months ago

Thanks @spjuhel. Please correct the following errors in the paper:

Please make sure all of the references are formatted correctly and no "nil" values exists. Thank you!

spjuhel commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

spjuhel commented 2 months ago

Forgot to say, This last version should include the corrections related to your comments.

(And thanks a lot for your detailed reviewing effort)

crvernon commented 2 months ago

👋 @spjuhel - we are almost there! Next is just setting up the archive for your new release.

We want to make sure the archival has the correct metadata that JOSS requires. This includes a title that matches the paper title and a correct author list.

So here is what we have left to do:

I can then move forward with accepting the submission.

spjuhel commented 2 months ago

I think this should be good: https://doi.org/10.5281/zenodo.11580697

crvernon commented 2 months ago

@editorialbot set v0.5.10 as version

editorialbot commented 2 months ago

Done! version is now v0.5.10