[REVIEW]: Epimargin: A Toolkit for Epidemiological Estimation, Prediction, and Policy Evaluation

whedon commented 3 years ago

Submitting author: @satejsoman (Satej Soman) Repository: https://github.com/COVID-IWG/epimargin Version: 0.6.2 Editor: @Nikoleta-v3 Reviewer: @wxwx1993, @dilawar Archive: 10.5281/zenodo.5430698

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/3353609dae03b32322ea1160826f2ad8"><img src="https://joss.theoj.org/papers/3353609dae03b32322ea1160826f2ad8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/3353609dae03b32322ea1160826f2ad8/status.svg)](https://joss.theoj.org/papers/3353609dae03b32322ea1160826f2ad8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@wxwx1993 & @dilawar, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @wxwx1993

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@satejsoman) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] #3502
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @dilawar

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@satejsoman) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @wxwx1993, @dilawar it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.28 s (476.3 files/s, 63851.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                         119           2485           1505          11985
SVG                              6              0              5           1304
Markdown                         6             87              0            274
TeX                              1              8              0             78
R                                1             10             11             77
-------------------------------------------------------------------------------
SUM:                           133           2590           1521          13718
-------------------------------------------------------------------------------

Statistical information for the repository '448da3205c0c5deb9425f255' was
gathered on 2021/07/09.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Caitlin Loftus                  12           517            125            1.19
Nicholas Marchio                 7           533            321            1.59
Steven Buschbach                 2           761             32            1.47
manasip5                         1            23              0            0.04
satej soman                    187         33042          18522           95.71

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Caitlin Loftus              292           56.5         11.9                4.79
Steven Buschbach            724           95.1         10.7               13.40
satej soman               14959           45.3          5.1                9.00

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1101/2020.09.08.20190629 is OK
- 10.2139/ssrn.3748704 is OK
- 10.3386/w27532 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 3 years ago

👋🏼 @satejsoman @wxwx1993, @dilawar this is the review thread for the paper. All of our communications will happen here from now on.

Both reviewers have checklists at the top of this thread with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#REVIEW_NUMBER so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package. Here is an example of a review: https://github.com/openjournals/joss-reviews/issues/3202

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you want to.

Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns 😄 👍🏻 Thank you again for agreeing to review this submission!

dilawar commented 3 years ago

Below is a partial review. It includes major issues I found during the installation and testing the software. I am (doker images of Linux: cenos8, debian:buster, and ArchLinux). The setup.py says that software is OS Independent and supports >=python3.6. I am mostly using python3.9 and python3.8.

In my fork, I've also added a Github actions based CI and a few test cases.

Functionality

[ ] Installation: Does installation proceed as outlined in the documentation?

Yes. pip3 install apimargin succeeded on a Linux machine: it downloaded a platform-independent wheel. The installation from the source failed on my machine because of pinned version in requirements.txt. I think that requirements.txt file can be removed since setup.py files also enumerate the installation dependencies.

After installation (from pypi), however, I ran into permission issues at the very first run.

from itertools import cycle

import epimargin.plots as plt
import numpy as np
import pandas as pd
from epimargin.utils import setup

(data, figs) = setup() # creates convenient directories
plt.set_theme("minimal")

resulted in,

<ipython-input-1-710a13386d8f> in <module>
      6 from epimargin.utils import setup
      7 
----> 8 (data, figs) = setup() # creates convenient directories
      9 plt.set_theme("minimal")

~/Work/FORKES/epimargin/epimargin/utils.py in setup(**kwargs)
     42     logging.basicConfig(**kwargs)
     43     logging.getLogger('flat_table').addFilter(lambda _: 0)
---> 44     return (mkdir(root / "data"), mkdir(root / "figs"))
     45 
     46 def fillna(array):

~/Work/FORKES/epimargin/epimargin/utils.py in mkdir(p, exist_ok)
     30 
     31 def mkdir(p: Path, exist_ok: bool = True) -> Path:
---> 32     p.mkdir(exist_ok = exist_ok)
     33     return p
     34 

/usr/lib/python3.9/pathlib.py in mkdir(self, mode, parents, exist_ok)
   1311         """
   1312         try:
-> 1313             self._accessor.mkdir(self, mode)
   1314         except FileNotFoundError:
   1315             if not parents or self.parent == self:

PermissionError: [Errno 13] Permission denied: '/usr/bin/data'

I solved it by modifying the cwd function (diff: https://github.com/dilawar/epimargin/commit/98535fb8921ff1b58ddff8324f83f469bc6e6671#diff-eb56eed78c526134068ae48c66f6eaaf70109da1d0fc031cc83cdd3e78285fff). May authors can review and use this commit. Open issue on repo https://github.com/COVID-IWG/epimargin/issues/116

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

Yes.

[ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
1. setup.py has list of dependencies which pip handles well.
2. A requirements.txt file is also given with pinned version of dependencies. User is expected use this file: pip3 install -r requiements.txt before pip3 install -e . but it did not succeed on Ubuntu/Debian and CentOS:8, For example, I did not find zip=4.1.0 (https://github.com/COVID-IWG/epimargin/issues/115) anywhere in pypi. I had to unpin few dependencies so pip can resolve an install-able version. Probably there is no need of requirements.txt file if dependencies are handled in setup.py.
[ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

Yes. A simple example is given in the README file. I was able to reproduce the figure.

And a lot of examples are available in studies folder. But instructions to run them are lacking. Ideally, each folder should have a pipeline based onMakefile, bash scripts etc.. Or authors should mention the steps to reproduce the results in README file in the folder itself. A few folders do not pass the basic lint tests (issue #117).

Some failed cases on CentOS-8

1

[centos@ip-172-26-6-166 karnataka (joss_lint_errors)]$ python3 age_rt.py 
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
  File "/home/centos/Work/FORKES/epimargin/studies/karnataka/age_rt.py", line 3, in <module>
    from epimargin.estimators import analytical_MPVS
  File "/home/centos/Work/FORKES/epimargin/epimargin/estimators.py", line 7, in <module>
    import pymc3 as pm
  File "/home/centos/.local/lib/python3.9/site-packages/pymc3-3.11.2-py3.9.egg/pymc3/__init__.py", line 34, in <module>
    if not semver.match(theano.__version__, ">=1.1.2"):
AttributeError: module 'semver' has no attribute 'match'

Authors should rerun their examples with python-3.9 or limit the python version in setup.py file.

[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

No. There are no automated tests. If there are some tests deep inside the codebase, I can't find a them via pytest etc. Authors should indicate somewhere in README.md if there is a test suite and how to run it. I've created one during my review https://github.com/dilawar/epimargin in tests directory.

I tried to run the scripts available in studies folder but ran into few problems. Running a linter (flake8) exposed gaping issues that needs to be fixed (https://github.com/COVID-IWG/epimargin/issues/117). Authors also need to document the instructions to reproduce the images in studies folder accurately in the README files; or better still provide a pipeline e.g. Makefile etc to automate the run.

[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Community guidelines are missing: 1 is definitely missing. 2, and 3 is not stated explicitly in the repository but available at https://adaptivecontrol.org/. Probably authors want to add them to the repository as well.

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

Yes. My reading of the paper that the major intention behind the project is to quantify/access the impact of the pandemic and the trade-offs of various policies that govt might implement to tackle the pandemic such as lock-down etc.

[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?

Yes.

[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?

There is no mention of any other software packages. I don't know if there are commonly-used packages. I am aware of a few modeling studies in India itself e.g. https://cni-iisc.github.io/epidemic-simulator/docs/index.html but I am not sure if these are comparable. It would be nice to have a small table of comparisons with existing tools (if any).

satejsoman commented 3 years ago

@dilawar thank you for your preliminary review! I'm off for a couple of days but will be able to address these comments later next week.

whedon commented 3 years ago

:wave: @dilawar, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @wxwx1993, please update us on how your review is going (this is an automated reminder).

satejsoman commented 3 years ago

Hi @dilawar, I've responded to/managed some of the issues you've brought up on their corresponding Github pages, and thank you for your thorough investigation. To elaborate on/respond to some of these points:

Functionality

Installation: Does installation proceed as outlined in the documentation? Yes. pip3 install apimargin succeeded on a Linux machine: it downloaded a platform-independent wheel. The installation from the source failed on my machine because of pinned version in requirements.txt. I think that requirements.txt file can be removed since setup.py files also enumerate the installation dependencies.

Generally, the setup.py and requirements.txt files have different uses. Python packaging practice indicates that setup.py have a minimal working set of requirement for end users to run out of the box, while requirements.txt contains the exact dependencies required to engage in development work on the package. (See this stackoverflow post for an explanation of the guidelines we try to follow). For this reason, I would keep both setup.py and requirements.txt.

After installation (from pypi), however, I ran into permission issues at the very first run.

We've switched to a more robust way of checking for the script directory location that should not try to write to /usr/bin based on our testing. See https://github.com/COVID-IWG/epimargin/issues/116 (closed by https://github.com/COVID-IWG/epimargin/pull/118).

Documentation

Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution. A requirements.txt file is also given with pinned version of dependencies. User is expected use this file: pip3 install -r requiements.txt before pip3 install -e . but it did not succeed on Ubuntu/Debian and CentOS:8, For example, I did not find zip=4.1.0 (Invalid dependency version zipp==4.1.0 COVID-IWG/epimargin#115) anywhere in pypi.

We have fixed the typo in requirements.txt (see https://github.com/COVID-IWG/epimargin/pull/119)

Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems). Yes. A simple example is given in the README file. I was able to reproduce the figure. And a lot of examples are available in studies folder. But instructions to run them are lacking. Ideally, each folder should have a pipeline based onMakefile, bash scripts etc.. Or authors should mention the steps to reproduce the results in README file in the folder itself. A few folders do not pass the basic lint tests (issue #117).

(quoting from the discussion in https://github.com/COVID-IWG/epimargin/issues/117): The scripts in the studies/ folder are not considered part of the core package; they are explorations and project-specific client code. I don't expect all the data scientists working with this package (some of whom have rolled off the project) to maintain these. I'd be happy to move these scripts to a separate epimargin-studies repository. At the very least, I'd request they be ignored in the code review.

I am also unable to reproduce the failure of the karnataka script on Python 3.9 but given the exploratory nature of these scripts, I don't think this should be a blocker for acceptance. I will keep digging into this, but the core workflow does work on Python 3.9.

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified? No. There are no automated tests. If there are some tests deep inside the codebase, I can't find a them via pytest etc. Authors should indicate somewhere in README.md if there is a test suite and how to run it. I've created one during my review https://github.com/dilawar/epimargin in tests directory.

Thank you for providing us with example tests - we are in the process of merging those into the main repository. I would argue the documentation provided in the tutorial counts as "manual steps described so that the functionality of the software can be verified".

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support Community guidelines are missing: 1 is definitely missing. 2, and 3 is not stated explicitly in the repository but available at https://adaptivecontrol.org/. Probably authors want to add them to the repository as well.

We have added all three points to the README (see https://github.com/COVID-IWG/epimargin/pull/120)

State of the field: Do the authors describe how this software compares to other commonly-used packages? There is no mention of any other software packages. I don't know if there are commonly-used packages. I am aware of a few modeling studies in India itself e.g. https://cni-iisc.github.io/epidemic-simulator/docs/index.html but I am not sure if these are comparable. It would be nice to have a small table of comparisons with existing tools (if any).

We have added discussions of packages that have functionality comparable to the submodules of epimargin (see https://github.com/COVID-IWG/epimargin/pull/121). We have discussed the advantages and design decisions of our package in the main paper but now explicitly add alternative packages that do similar things but take a more opinionated stance on the nature of the estimators provided.

Let me know what the next steps are! Again, we are very thankful for your efforts and diligence.

dilawar commented 3 years ago

I am also unable to reproduce the failure of the karnataka script on Python 3.9 but given the exploratory nature of these scripts, I don't think this should be a blocker for acceptance. I will keep digging into this, but the core workflow does work on Python 3.9.

[centos@ip-172-26-6-166 karnataka (joss_lint_errors)]$ python3 age_rt.py 
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
  File "/home/centos/Work/FORKES/epimargin/studies/karnataka/age_rt.py", line 3, in <module>
    from epimargin.estimators import analytical_MPVS
  File "/home/centos/Work/FORKES/epimargin/epimargin/estimators.py", line 7, in <module>
    import pymc3 as pm
  File "/home/centos/.local/lib/python3.9/site-packages/pymc3-3.11.2-py3.9.egg/pymc3/__init__.py", line 34, in <module>
    if not semver.match(theano.__version__, ">=1.1.2"):
AttributeError: module 'semver' has no attribute 'match'

The package semver has deprecated/removed the match function (https://github.com/python-semver/python-semver/issues/236).

Opened: https://github.com/COVID-IWG/epimargin/issues/122

satejsoman commented 3 years ago

Hi @dilawar, https://github.com/COVID-IWG/epimargin/pull/129 has taken care of all the linting issues I could find with pylint and mypy. There is one temporary change to the class hierarchy that does not affect any user-facing code and it is being tracked here: https://github.com/COVID-IWG/epimargin/issues/130.

With this merged, do you think remaining checklist items are ready to be marked as resolved? (i.e.: substantial scholarly effort, functionality, example usage, functionality documentation, "manual steps described so that the functionality of the software can be verified")

@wxwx1993 - is there anything I can do to help with your review, like cutting a new release with the latest issues I've fixed and pushing it to PyPI?

dilawar commented 3 years ago

@satejsoman I'll complete my review by the weekend. You may not hear from me till this Saturday.

satejsoman commented 3 years ago

that sounds great, thanks again!

dilawar commented 3 years ago

These are the remaining points on my checklist.

[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

tutorial.py seems to work fine. There is one example embedded in the README.md file. I think these are enough for this item.

Additionally, I would appreciate if docs/toy_example.py can also be fixed or removed. And all examples are added to a test pineline. If you wish to use CI pipeline I sent over a PR, enable the test pipeline in CI. I will help with the integration.

[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

I could not find API docs that are browsable. pydoc3 epimargin did work so I various IDE will be able to use it. It meets the minimal requirement.

Nonetheless, I added readthedocs based flow https://github.com/COVID-IWG/epimargin/pull/132 that may be useful for the users (this is optional).

[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

I'd insist on adding pytest or any other unit testing framework to the development flow. Also, consider adding a CI pipeline (such as Github actions, Travis, gitlab-ci etc, Gihub actions can be cherry-picked from https://github.com/COVID-IWG/epimargin/pull/131).

Nikoleta-v3 commented 3 years ago

Thank you @dilawar for such a thorough review! I have gone over the issues and the discussions both here and on the software repository. I just wanted to say that regarding:

Additionally, I would appreciate it if docs/toy_example.py can also be fixed or removed.

Agreed that: a user should be able to run all examples that are included in the repository without failures. This included the scripts under the folder studies.

The scripts in the studies/ folder are not considered part of the core package; they are explorations and project-specific client code. I don't expect all the data scientists working with this package (some of whom have rolled off the project) to maintain these. I'd be happy to move these scripts to a separate epimargin-studies repository. At the very least, I'd request they be ignored in the code review.

👍🏻 Please do move them to a separate repository or keep them as they are and make sure they are maintained. Also, if they are moved to a new repository and are not maintained please do state that in the readme.

This is mainly to help future new users to the project that will be looking to understand how the package can help their work. Thus, is mainly to ensure that the package is documented to a satisfactory level which leads me to this:

Nonetheless, I added readthedocs based flow COVID-IWG/epimargin#132 that may be useful for the users (this is optional).

Having a readthedocs is not a requirement for publication, and honestly https://epimargin.readthedocs.io/en/latest/epimargin.html in its current form does not add much. The current codebase is modular, the functions are documented and include type hints. These practices make the code documented well enough.

Nevertheless, as @dilawar demonstrated it is very easy to create a readthedocs page for the package. In the future you should look into changing the functions’ documentation style to numpy style (for example) so that the readthedocs “write” themselves https://numpy.org/doc/stable/docs/howto_document.html . If this is indeed something that interests the project in the future I would also be very happy to help with a PR.

ps if you are interested in “how to write documentation” I often recommend the Diátaxis framework https://diataxis.fr which is what I also use in my projects.

Lastly,

I'd insist on adding pytest or any other unit testing framework to the development flow. Also, consider adding a CI pipeline (such as Github actions, Travis, gitlab-ci etc, Gihub actions can be cherry-picked from COVID-IWG/epimargin#131).

Authors are strongly encouraged to include an automated test suite covering the core functionality of their software. However, documented manual steps that can be followed to objectively check the expected functionality of the software (e.g., a sample input file to assert behavior).

which I believe comes back to making sure that the scripts that are included in the repository run without failures and the output is expected. If I tried to run one of your “example scripts” as a new user and it failed I would assume that something is broken in the codebase.

And all examples are added to a test pineline. If you wish to use CI pipeline I sent over a PR, enable the test pipeline in CI. I will help with the integration

This is a great idea, but it’s again not required of you. However, @dilawar has already offered their help, and I would also be happy to assist if this is something you would be interested in learning/including in the project.

A summary:

The package is documented well enough as it is. Including scripts such as toy_example.py and others from studies can only further help the documentation of your project and they should be included.

To ensure that the package is “tested” the scripts you decide to include in repository should be able to run without failure and produce the expected output.

Optional (possible future) points regarding

Documentation:

Tweak the functions documentation style and generate a readthedocs

Testing:

plug in the example scripts to a CI
add pytest or any other unit testing framework to the development flow.

satejsoman commented 3 years ago

Hi @dilawar and @Nikoleta-v3 - thanks again.

Here are the changes that have been implemented:

the studies folder has been moved to its own repository (see: https://github.com/COVID-IWG/epimargin/pull/133)
docs/toy_example.py has been updated to refer to the latest classes/package names and runs out of the box (see: https://github.com/COVID-IWG/epimargin/pull/134)
Github Actions have been set up to run the tutorial (we cherrypicked your template, @dilawar) (see: https://github.com/COVID-IWG/epimargin/pull/135)

Here are the future changes we are tracking:

set up a more comprehensive pytest suite (see: https://github.com/COVID-IWG/epimargin/issues/136)
add the pytest suite to the CI pipeline (see: same issue as above)

We expect to onboard a developer in the next few weeks and will be able to proceed with these fairly shortly from there!

Nikoleta-v3 commented 3 years ago

👋🏻 thank you for your work @satejsoman

Github Actions have been set up to run the tutorial (we cherrypicked your template, @dilawar) (see: cherrypick GH actions from review branch COVID-IWG/epimargin#135)

Fantastic! 🎉 Note that Github Actions haven't run yet (https://github.com/COVID-IWG/epimargin/actions).

I believe it's because there hasn't been a "push" to the project since you setup the CI, and CI is only going to run after a "push". This is because of line.

If Github Actions run now you are going to get a failure because:

- name: Run tutorial
      run: |
        cd studies/tutorial/ && python tutorial.py

studies/tutorial/ does not exist anymore.

You need to move tutorial.py back in the main repository.

I recommend the following:

Run Github Actions and let them fail ❌
Move studies/tutorial/tutorial.py to doc/tutorial.py
Tweak README to say doc/tutorial instead of studies/tutorial/.
Change python.yaml to run doc/tutorial and doc/toy_example. Tests should pass ✅
Add readme in doc to explain (very briefly) the usage of the files Create install.md, tutorial.py, toy_example.
Add a brief section on your main readme exampling what it's currently being tested with the CI.

Feel free to ping if something is unclear (probably my fault for explaining things badly) 👍🏻 😄

satejsoman commented 3 years ago

No worries @Nikoleta-v3, my fault for moving too quickly 😅

I've set up the Github Actions (and the successful runs are showing up) and have added the documentation to the README.

I elected to not add the toy_example to the CI since the more important workflows are captured in the tutorial.

wxwx1993 commented 3 years ago

Hi @dilawar, COVID-IWG/epimargin#129 has taken care of all the linting issues I could find with pylint and mypy. There is one temporary change to the class hierarchy that does not affect any user-facing code and it is being tracked here: COVID-IWG/epimargin#130.

With this merged, do you think remaining checklist items are ready to be marked as resolved? (i.e.: substantial scholarly effort, functionality, example usage, functionality documentation, "manual steps described so that the functionality of the software can be verified")

@wxwx1993 - is there anything I can do to help with your review, like cutting a new release with the latest issues I've fixed and pushing it to PyPI?

Yes, that would be great! I am sorry for the late reply.

wxwx1993 commented 3 years ago

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?

I do not find the comparison to other commonly-used packages. I think CDC has the ensemble forecast model, which might be nice to compare with (https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/mathematical-modeling.html). Also, I think it would be nice to also list the data input required for the software functions.

[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

satejsoman commented 3 years ago

Hi @wxwx1993 - no worries! epimargin 0.6.0 is on PyPI, reflecting all the review comments addressed so far.

The CDC ensemble forecast is a different modeling approach that re-weights underlying models to provide a predictive forecast. The models we provide here (and those available in EpiEstim and EpiNow, the comparison packages mentioned in our paper) can be embedded in such an ensemble. Moreover, this package provides estimation procedures for epidemiological quantities such as Rt and does not take into account hospital capacity out of the box (though that constraint can be added in client code).

satejsoman commented 3 years ago

Hi @wxwx1993, let me know if there's anything else I can do to aid in your review

wxwx1993 commented 3 years ago

@satejsoman After I successfully installed the most updated version of epimargin

Collecting epimargin
  Downloading epimargin-0.6.0-py3-none-any.whl (38 kB)
Collecting scikit-learn
  Downloading scikit_learn-0.24.2-cp39-cp39-macosx_10_13_x86_64.whl (7.3 MB)

When I run the setup, I got the following error

---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
<ipython-input-24-71df71c346d5> in <module>
      7 from epimargin.utils import setup
      8 
----> 9 (data, figs) = setup()
     10 plt.set_theme("minimal")

~/opt/anaconda3/lib/python3.7/site-packages/epimargin/utils.py in setup(**kwargs)
     38         parser = argparse.ArgumentParser()
     39         parser.add_argument("--level", type=str)
---> 40         flags = parser.parse_args()
     41         kwargs["level"] = flags.level
     42     logging.basicConfig(**kwargs)

~/opt/anaconda3/lib/python3.7/argparse.py in parse_args(self, args, namespace)
   1750         if argv:
   1751             msg = _('unrecognized arguments: %s')
-> 1752             self.error(msg % ' '.join(argv))
   1753         return args
   1754 

~/opt/anaconda3/lib/python3.7/argparse.py in error(self, message)
   2499         self.print_usage(_sys.stderr)
   2500         args = {'prog': self.prog, 'message': message}
-> 2501         self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

~/opt/anaconda3/lib/python3.7/argparse.py in exit(self, status, message)
   2486         if message:
   2487             self._print_message(message, _sys.stderr)
-> 2488         _sys.exit(status)
   2489 
   2490     def error(self, message):

SystemExit: 2

Could you guide me on the setup? Thanks.

wxwx1993 commented 3 years ago

Hi @wxwx1993 - no worries! epimargin 0.6.0 is on PyPI, reflecting all the review comments addressed so far.

The CDC ensemble forecast is a different modeling approach that re-weights underlying models to provide a predictive forecast. The models we provide here (and those available in EpiEstim and EpiNow, the comparison packages mentioned in our paper) can be embedded in such an ensemble. Moreover, this package provides estimation procedures for epidemiological quantities such as Rt and does not take into account hospital capacity out of the box (though that constraint can be added in client code).

I think it may be helpful to provide a list of the data input required for the software functions in the manuscript. For example, I am curious that if I want to estimate Rt, what types of data do I need (e.g., daily COVID-19 cases, what else)? By reading the manuscript, I do not think I can obtain such information. Thank you!

satejsoman commented 3 years ago

@wxwx1993 epimargin 0.6.1 has a bugfix that should allow the tutorial to work with ipython -i.

on the point of adding input data requirements to the manuscript, I prefer documenting the input timeseries and parameters of the included estimators in code (as done in the latest commit on master) rather than in the paper because the exact estimator inputs are methodological choices. For example, the analytical mean-preserving variance spread estimator can take either a timeseries of infections or deaths, and which is more appropriate is a modeling choice left to the client user.

The point we are trying to make in the paper itself is one of modularity: the end user can mix and match estimators and models as they see fit and there is nothing precluding them from using a completely different estimator (based on e.g. mobility patterns or virological rather population-level models) but plugging those estimates into a model provided out of the box. Does that make sense?

satejsoman commented 3 years ago

Hi @wxwx1993, just checking in to see if you were able to run the tutorial on version 0.6.1?

wxwx1993 commented 3 years ago

@satejsoman Yes, I can successfully run the tutorial with iPython, but I am not able to run it in Jupiter Notebook. But I think it is fine.

wxwx1993 commented 3 years ago

I think your approach to document the input time-series and parameters of the included estimators in code is perfect for me. I do not have further major comments.

satejsoman commented 3 years ago

thank you, @wxwx1993!

@Nikoleta-v3 what are the next steps in the review process from here?

dilawar commented 3 years ago

I have one unchecked item.

[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

I guess this has been resolved by https://github.com/openjournals/joss-reviews/issues/3464#issuecomment-896383752 (also https://github.com/openjournals/joss-reviews/issues/3464#issuecomment-895890308)?

dilawar commented 3 years ago

I have one unchecked item.
* [ ]  Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
I guess this has been resolved by #3464 (comment) (also #3464 (comment))?

[x] CI is added: https://github.com/COVID-IWG/epimargin/actions/runs/1162050595 👍🏽 .
[x] Outdated studies are not part of repo anymore.
[x] Testing framework can be added in the future.

Nikoleta-v3 commented 3 years ago

Thank you @dilawar and @wxwx1993 for your comments and your time! @satejsoman I would like to have one final look over the paper before we can proceed 👍🏻

Nikoleta-v3 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 3 years ago

Thank you for your patience @satejsoman. The paper looks really good 👍🏻

Your package is a very useful tool, and it would be great to see it being used to analyse data across the globe. One question: Assume that I want to analyse data for Greece which have been downloaded from a different source and not covid19india.org. Would I have to tweak the format of my data to a "standard" one - so they can be loaded and analysed?

Two minor comments/questions:

In the readme you refer to studies/tutorial/data which is not there anymore.
This line is commented out with a question: https://github.com/COVID-IWG/epimargin/blob/fd8712ae8f2d2a6c785d046bd711f37bb3ec9d05/epimargin/estimators.py#L119. Is it meant to be here?

satejsoman commented 3 years ago

Hi @Nikoleta-v3, the estimators generally work on anything that looks like a timeseries in pandas DataFrame form, with the assumption that the index is either an object interpretable as a timestamp, or a simple integer index (I've added clarifying comments to epimargin/estimators.py). Many data sources provide data in this format, and for sources that do not, it's usually a simple transformation (e.g. our adapter for JHU's CSSE repository)

Thanks for bringing up that point! I've also fixed the references to studies/tutorial/data in the readme and have removed the unneeded comments.

Nikoleta-v3 commented 3 years ago

No further comments. Everything looks good to me 👍🏻

At this point could you:

[x] Make a new tagged release of your software, and list the version tag of the archived version here. Currently v0.5.2 is listed. I think you should make a new one given the additions and edits to the codebase.
[x] Archive the reviewed software in Zenodo or a similar service (e.g., figshare, an institutional repository)
[x] Check the archival deposit (e.g., in Zenodo) has the correct metadata. This includes the title (should match the paper title) and author list (make sure the list is correct and people who only made a small fix are not on it). You may also add the authors' ORCID.
[x] Please list the DOI of the archived version here.

satejsoman commented 3 years ago

Hi @Nikoleta-v3, I've tagged and released version 0.6.2 of epimargin, and uploaded it to Zenodo and PyPI. The DOI is: 10.5281/zenodo.5430698

Nikoleta-v3 commented 3 years ago

Thank you @satejsoman. A sanity check: Nicholas Marchio is not listed as an author in the paper.

satejsoman commented 3 years ago

I've just updated the metadata to fix that - thanks for flagging @Nikoleta-v3

Nikoleta-v3 commented 3 years ago

@whedon check references

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1101/2020.09.08.20190629 is OK
- 10.2139/ssrn.3748704 is OK
- 10.3386/w27532 is OK
- 10.21105/joss.03290 is OK
- 10.1093/aje/kwt133 is OK
- 10.12688/wellcomeopenres.16006.2 is OK

MISSING DOIs

- 10.1101/2020.06.18.20134858 may be a valid DOI for title: Practical considerations for measuring the effective reproductive number, R t

INVALID DOIs

- None

Nikoleta-v3 commented 3 years ago

@satejsoman can you fix the missing doi? Almost there!

Nikoleta-v3 commented 3 years ago

Apologies for not spotting this earlier

satejsoman commented 3 years ago

@whedon check references

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1371/journal.pcbi.1008409 is OK
- 10.1101/2020.09.08.20190629 is OK
- 10.2139/ssrn.3748704 is OK
- 10.3386/w27532 is OK
- 10.21105/joss.03290 is OK
- 10.1093/aje/kwt133 is OK
- 10.12688/wellcomeopenres.16006.2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

satejsoman commented 3 years ago

@Nikoleta-v3 no woes, updated that DOI 😄

Nikoleta-v3 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

openjournals / joss-reviews