openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
720 stars 38 forks source link

[REVIEW]: piecewise-regression (aka segmented regression) in Python #3859

Closed whedon closed 2 years ago

whedon commented 2 years ago

Submitting author: @chasmani (Charlie Pilgrim) Repository: https://github.com/chasmani/piecewise-regression Version: v1.2.0 Editor: @galessiorob Reviewers: @vyasr, @htjb, @Ebedthan Archive: 10.5281/zenodo.5742317

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b64e5e7d746efc5d91462a51b3fc5bf8"><img src="https://joss.theoj.org/papers/b64e5e7d746efc5d91462a51b3fc5bf8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b64e5e7d746efc5d91462a51b3fc5bf8/status.svg)](https://joss.theoj.org/papers/b64e5e7d746efc5d91462a51b3fc5bf8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@vyasr, @lauravana and @htjb, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @galessiorob know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @vyasr

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @htjb

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @Ebedthan

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 2 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @vyasr, @htjb it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 2 years ago

Wordcount for paper.md is 719

whedon commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.1002/sim.1545 may be a valid DOI for title: Estimating regression models with unknown break-points
- 10.25080/majora-92bf1922-011 may be a valid DOI for title: Statsmodels: Econometric and statistical modeling with python
- 10.1111/j.0006-341x.2001.00240.x may be a valid DOI for title: Minimizing model fitting objectives that contain spurious local minima by bootstrap restarting

INVALID DOIs

- None
whedon commented 2 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.14 s (403.4 files/s, 170026.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JavaScript                      13           2405           2470           9218
SVG                              1              0              0           2671
Python                          18            792            426           1566
HTML                             8            260             18           1555
CSS                              6            338             55           1292
reStructuredText                 6            114             27            228
Markdown                         1             22              0             53
TeX                              1              3              0             38
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            56           3946           3004          16656
-------------------------------------------------------------------------------

Statistical information for the repository '4f96ff6cb62bdbefc8789d97' was
gathered on 2021/10/26.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
chasmani                        79         21965           5060          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Chasmani                  16877          100.0          0.3               16.13
whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

htjb commented 2 years ago

Dear @chasmani,

Thank you for the interesting submission! The software appears to fill a gap in the Python library of tools and the paper is a useful contribution to the literature. I have been able to install and run the code without any issues.

Overall the repository is well put together and the code is well written and documented. I did however have a few comments which I think would make the code more accessible to the community. I have listed these below since I do not believe they are sufficient to warrant separate Issues but I am happy to break them up into Issues on the repo if you would prefer. Also happy to discuss any of the points further if clarification is needed.

In terms of the paper

I hope you find these comments useful! As I have said overall the code is really nice and I think the additional statistical tests that have been implemented (Davies and BIC) are really useful.

Best, Harry

galessiorob commented 2 years ago

Thank you so much for your speedy review @htjb ⚡ I appreciate the thorough documentation on the feedback.

@chasmani, if you can please start by addressing the automated tests and the community guidelines so we can make progress with the actual software. I'll open a PR with some edit suggestions shortly so we can address the paper.

@vyasr please let me know if I can help with anything so you can get started with the review. Appreciate you volunteering your time!

chasmani commented 2 years ago

@htjb thank you for the review. All good points. I will aim to address them all this week.

Charlie

galessiorob commented 2 years ago

@whedon add @lauravana as reviewer

whedon commented 2 years ago

OK, @lauravana is now a reviewer

galessiorob commented 2 years ago

@whedon add @Ebedthan as reviewer

whedon commented 2 years ago

OK, @Ebedthan is now a reviewer

Ebedthan commented 2 years ago

Dear @chasmani,

Thank you for your submission which describes software that I think will be helpful for the Python community where the lack of such software was a gap. I have some comments that I think will improve the overall submission. These comments intentionally avoid duplication with comments from @htjb that I hope you will fully take into consideration. I hope you will find it helpful.

First on the repository:

Finally on the paper:

Thank you for your submission and the software 👍🏾.

Best, Anicet

chasmani commented 2 years ago

@Ebedthan thank you very much for your review and comments.

I've made changes based on your comments. Responses are in bold below:

First on the repository:

Finally on the paper:

Thanks again. I will address the other reviewer comments today as well. Charlie

galessiorob commented 2 years ago

@chasmani great job at making progress on the reviewers' feedback. Once you're done with those, I'll give it another pass myself.

@vyasr, mind updating us on your timeline to get to the review list, please? Let me know if you have any questions or are blocked in any way.

whedon commented 2 years ago

:wave: @htjb, please update us on how your review is going (this is an automated reminder).

whedon commented 2 years ago

:wave: @vyasr, please update us on how your review is going (this is an automated reminder).

vyasr commented 2 years ago

Hi all, apologies for being a bit slow here. I have been swamped but I plan to get my first round of review done either Friday or next Monday/Tuesday!

vyasr commented 2 years ago

Cool project @chasmani! I have actually wanted the ability to perform piecewise fits like this before. Have you ever looked at pwlf? At one point I attempted to use it for similar tasks during my PhD, and although I didn't end up incorporating it into production workflows I found that it did a reasonably good job. It uses a significantly different algorithm described in this paper, but I think he also considered the Muggeo paper. You might benefit from some collaboration with the author of that package.

The code quality here looks good, as do your docstrings. I have some notes in addition to my checklist above. I'll denote which ones I consider to be requirements to address in some fashion, and which ones that I'd just like to suggest as part of the review process but aren't necessary for me to approve the submission. Requirements:

Nice to have:

chasmani commented 2 years ago

Thank you @htjb for your review and comments. There were some good tips to improve the codebase. (I will address @vyasr 's comments separately).

I've addressed those and included responses in bold below. I've implemented the majority of the suggestions.

  1. The readme is well written however it might benefit from an additional section detailing the required packages. It would also be good to include a title and a table at the top of the readme including the current version number and quick links to the repo and documentation (as is done in anesthetic and globalemu). I added this. Also added some of the badges.

  2. I would also refrain from suggesting that people use sudo pip3 install in the installation section since pip3 install is typically sufficient. Done

  3. Finally you may want to add instructions to install the package from source e.g. git clone https://github.com/chasmani/piecewise-regression cd piecewise_regression python3 setup.py install --user I added this to the bottom of the readme - I don't want to add too much to the top as it may be confusing or unclear to some users

  4. Again the code is well written and particularly well documented. However, I think that it is important that the code meets certain conventions and styles that help with readability. The code should comply with the PEP-8 standard. You can check against PEP-8 using Flake8 in the terminal (e.g. flake8 piecewise_regression). Updated to PEP8 compliance in the vast majority of instances A small number of lines do not comply because it would make the code less readable e.g. using descriptive but long variable names means some lines og over the recommend length, but changing those variables to be shorter would make the code less readable

  5. There are no clear community guidelines in the repo. You should add a section in the readme about contributing to the software (again some examples include anesthetic and globalemu). Done

  6. The testing suite looks to be well constructed. It is worth adding instructions on how the user can recover the coverage of the testing suite e.g. by running

    nosetests --with-coverage --cover-erase --cover-package=piecewise_regression/ to the readme. Done

  7. The coverage of the testing suite is a little low at 67% and I would suggest writing some additional tests to increase the coverage to >90% (higher coverage = higher user confidence). Done. Increased to 97%

  8. Ideally you should also set up continuous integration as well which can be done with github actions. Although I do not believe this is a specific requirement for JOSS I would advise it. Thanks for the suggestion. I've not used Continuous Integration before and it looks great. I implemented CI to automatically run tests upon git push, for python 3.7,3.8,3.9. Currently passing. That is very useful for automatically checking that the package works across python versions. Also it automatically uploads the test coverage to codecov.

  9. You should refrain from including data directly in the testing codes as well (e.g. np.array[[100.6466801609725, 100.22604536744389, ...]). It is perfectly reasonable to save this to .txt or .npy files and upload it to the repo in a folder tests/data/ so that it can be read in by the testing scripts. Done

  10. The example use case is easy to follow and use. It might be advantageous to put this into a Jupyter notebook and host it on binder for ease of access. I don't think this is necessary, considering that the package is quite simple to use. It is more overhead for things to update following changes. And duplicating the code increases the chance that the code example in the readme and the Jupyter notebook will diverge at some point in the future, which could be confusing for users. I prefer to have one single instance of the code example, following DRY. I could generate the README code and the Jupyter notebook from a single source but that feels like over engineering. I can add this if it is a deal breaker.

In terms of the paper

  1. More details should be given regarding the example fit. For example reporting and explaining the results of the Davies hypothesis test for this example. Done.

  2. It may be advisable to merge the Example and How it works sections for readability. I'd prefer to keep these separate because I anticipate that some users will be interested in the underlying Maths and others will be put off by it. As it is, the users who are not mathematically inclined can clearly see that the How It Works section is not crucial and they can skip it.

  3. A discussion of the BIC would be nice as well. Perhaps fitting the example with max_breakpoints=5 and using the BIC to determine which is favourable. Included. I briefly explained the rationale of using the BIC, and reported the result of the preferred number of breakpoints using the same data as shown in the example figure.

  4. I would cite the relevant papers for the Davies test. Yes good point. Done.

  5. If there are any Python implementations of the method using a grid search for the breakpoints I believe these should be referenced. Done

chasmani commented 2 years ago

Thank you @vyasr for your review and comments. I've responded below:

Cool project @chasmani! I have actually wanted the ability to perform piecewise fits like this before. Have you ever looked at pwlf? At one point I attempted to use it for similar tasks during my PhD, and although I didn't end up incorporating it into production workflows I found that it did a reasonably good job. It uses a significantly different algorithm described in this paper, but I think he also considered the Muggeo paper. You might benefit from some collaboration with the author of that package.

The code quality here looks good, as do your docstrings. I have some notes in addition to my checklist above. I'll denote which ones I consider to be requirements to address in some fashion, and which ones that I'd just like to suggest as part of the review process but aren't necessary for me to approve the submission. Requirements:

  1. It looks like you addressed @htjb's suggestion to include information on installing from source, but it's missing newlines. I think your best bet since it's an rst file is to use a bash directive like .. code:: bash at the top of that block for proper formatting. Done

  2. I think the statement of need should include some more concrete examples. What are common use cases for piecewise linear fits? When are they more appropriate than curve fits (or simple linear regression)? Done

  3. I mentioned pwlf at the top of this review, I think you should at least mention its existence since it is closely related and is a Python option unlike the R packages you currently mention. It might also be nice to include a rough comparison of the different methodologies. IIRC that package uses Lagrange multipliers for local solutions, combined with a differential evolution algorithm for global optimization, but I haven't looked at it in a long time. Perhaps there are others as well, and I think the paper would benefit from a survey to see if such packages exist. Done, good call.

4a. The first sentence of "How it works" seems out of place. Done

4b. Stating that we don't need to know the theory seems like a better fit for the summary by stating something like "this package hides the mathematical complexities of constructing a piecewise fit behind a simple interface" or so. I've removed the line from "How It Works". I decided not to add anythig to the summary for conciseness, I think "easy-to-use" covers it

  1. The example presents a raw plot with no context. Please give some description of the generating model, hopefully in the context of some realistic process that it may represent. Agreed. I explained the data generation process as a pieceiwise linear model with Gaussian noise. I gave the example of observations of a physical process with noise due to sampling errors, so the noise could reasonably be expected to be Gaussian and constant. That is quite an abstract example, with the benefit that it represents the general nature of the fitting algorithm. I thought about generating data based on a more specific real process, but that could require specific caveats that would detract from the general nature of the algorithm, and I do not have deep domain expertise in a suitable area to talk authoritatively about the forms of data.

  2. We need to see a few more references for the Davies and BIC tests. Done

Nice to have:

  1. I don't think there's any need for pip3 in the install, just pip should be fine. On any modern Python installation that will alias to pip. Done

  2. Have you considered supporting conda-forge? For a pure Python package like yours it should be relatively trivial, you just need to create a feedstock. Not critical, just something to consider since it's a popular installation method these days. TO DO

  3. While your tests run fine, I would recommend switching away from nose to either raw unittest. nose has not been supported for >5 years last I checked. pytest would be a more modern alternative, but it would require more effort to restructure your tests. Done. Now uses pytest (tis was also easier to integrate with codecov.io)

  4. In addition to @htjb's recommendation of using flake8, I would also recommend adding some additional linters like black, pydocstyle, and isort to automate as much formatting as possible. pre-commit can help significantly with managing those. Thanks I'll take a look at those. I've already done the formatting for this project to be PEP8 compliant for this stage. I'll look at using some of those linters in my normal workflow from now on, so when I come to make future changes I will most likely integrate them into this library.

  5. Strongly in favor of @htjb's recommendation of adding CI testing. Done

  6. It would be nice to indicate how your package uses statsmodel internally for ordinary least squares (and potentially what you're using numpy for if it's anything more than just standard numpy ops, which could be mentioned in a single sentence). statsmodels is used to perform ordinary least squares. Added to the How It Works section. Also added details of all required packages and what they are used for in the readme. Numpy is used for normal standard ops. The most unusual use is np.heaviside, which is used to transform data prior to fitting (could easily be implemented as my own function). That is shown in the example code as well.

  7. You could embed your examples along with the code to generate them on your ReadTheDocs. If you're feeling adventurous, nbsphinx would let you create a notebook with the example and embed that directly. Done I also added a link to a Google Colab jupyter notebook with code examples

chasmani commented 2 years ago

Thank you @vyasr @Ebedthan @htjb for your reviews, and everyone else for managing the submission.

I really think that your comments have significantly improved the paper and the code, and especially the workflows around making code changes.

This is my first paper at JOSS, I'm not sure what the next step is.

Ebedthan commented 2 years ago

I think you made nice improvements to the repository and paper. For the next steps, @galessiorob will have a look at all the stuff made. Thanks, @chasmani.

htjb commented 2 years ago

@whedon generate pdf

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

htjb commented 2 years ago

Dear @chasmani,

Thank you for the response to my comments and apologies I haven't had a chance to look at this work again sooner.

It looks like you have made some good progress with the repository and the testing/coverage looks great. The changes that have been made to the paper are also really helpful. Particularly the more detailed statement of need is great and the google colab notebook will help people get to grips with the software.

I am happy with the state of the repo and paper and have ticked my checklist off. I am still a bit concerned about the PEP8 compatibility however the code is definitely readable and I wouldn't want this to delay acceptance. I will leave this up to @galessiorob but have detailed some specifics below.

When I run flake8 on the updated code I still see a number of inconsistencies with the standard. An example from data_validation.py that addresses your concerns about variable names is the following line

value_error_text = "{} must be a list of numbers with minimum length {}".format(var_name, min_length)

which can be re-written to be PEP8 compatible as

value_error_text = \
  "{} must be a list of numbers with minimum length {}".format(
    var_name, min_length)

Similarly there are a lot of imports that don't get used for example in model_selection.py and __init__.py. __init__.py is typically left empty if I recall. You should fix the white space around operators as well and make sure that you leave enough blank lines around functions. Running flake8 piecewise_regression/ --select=E,F in the terminal should not return anything. I know this probably sounds very pedantic but not only does it help with readability it also gives a well known standard of code to aim for when people want to contribute via issues and pull requests.

You can also add flake8 to the github actions.

Finally, you should probably remove the file Vyas review reply.txt in paper/ on the repo.

The package is really helpful and the repo/paper are looking great as I have said. I will be looking out for opportunities to use this in my work!

Best, Harry

galessiorob commented 2 years ago

@whedon generate pdf

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

galessiorob commented 2 years ago

Thank you @vyasr @Ebedthan @htjb for your thorough and kind reviews! We're getting very close to the finish line for this paper 🚀

@chasmani Could you please take a look at @htjb 's suggestions on their last issue, please? I don't think the PEP8 compatibility issue should be a blocker for publishing but will check with other editors. As for the formatting it'd be great if you could get to it 🎉

I created a branch where I am adding some edits, I'll open a PR once I'm done. Thanks for all the work thus far, this is such a good and useful piece of software.

chasmani commented 2 years ago

@hjtb thanks for looking at it again. I've just made it completely PEP8 compliant, I think. Running flake8 piecewise_regression/ --select=E,F now returns nothing. I'm new to PEP8 but yes I agree its a good standard to have, and I'll be using it in the future.

Thank you @galessiorob, @hjtb @vyasr and @Ebedthan for all your comments and reviews.

I think that is everything now :)

vyasr commented 2 years ago

I just went through and double-checked that all my suggestions have been implemented. I've checked all the boxes and I don't have anything further to add. Both the repo and the paper are looking good!

htjb commented 2 years ago

@hjtb thanks for looking at it again. I've just made it completely PEP8 compliant, I think. Running flake8 piecewise_regression/ --select=E,F now returns nothing. I'm new to PEP8 but yes I agree its a good standard to have, and I'll be using it in the future.

Brilliant thank you! This looks great. Congrats on a nice piece of software :smile:

galessiorob commented 2 years ago

@chasmani I put together a set of edits for the paper and the readme in this PR. Once you've reviewed them please deposit your software in Zenodo and post the DOI and most recent version here. After that, we can set the final version, generate a final proof, and have an Editor in Chief do the publishing.

@vyasr, @htjb, @Ebedthan Thank you so much for lending your expertise and reviewing this paper 🚀

chasmani commented 2 years ago

Thanks @galessiorob for your input. I merged most of your suggestions. Thank you especially for the contribution guidelines. And thank you again to the reviewers.

I just depositied the software on zenodo and made a new release on Github.

All good?

galessiorob commented 2 years ago

@whedon set v1.2.0 as version

whedon commented 2 years ago

OK. v1.2.0 is the version.

galessiorob commented 2 years ago

@whedon set 10.5281/zenodo.5742317 as archive

whedon commented 2 years ago

OK. 10.5281/zenodo.5742317 is the archive.

galessiorob commented 2 years ago

@whedon generate pdf

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

galessiorob commented 2 years ago

@chasmani tiny nit: capitalize the "P" in the title; right now it shows "piecewise"

chasmani commented 2 years ago

@galessiorob I'd prefer to keep it lowercase. "piecewise-regression" is the python package name so I wanted to use the python convention of lowercase.

This is common in JOSS, see: https://joss.theoj.org/papers/10.21105/joss.03619 https://joss.theoj.org/papers/10.21105/joss.03781 https://joss.theoj.org/papers/10.21105/joss.03358

galessiorob commented 2 years ago

Totally ok @chasmani 👍

galessiorob commented 2 years ago

@whedon recommend-accept

whedon commented 2 years ago
Attempting dry run of processing paper acceptance...
whedon commented 2 years ago

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/2783

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/2783, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
whedon commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.1002/sim.1545 may be a valid DOI for title: Estimating regression models with unknown break-points
- 10.1890/02-0472 may be a valid DOI for title: Piecewise regression: a tool for identifying ecological thresholds
- 10.1046/j.1365-2710.2002.00430.x may be a valid DOI for title: Segmented regression analysis of interrupted time series studies in medication use research
- 10.25080/majora-92bf1922-011 may be a valid DOI for title: Statsmodels: Econometric and statistical modeling with python
- 10.1111/j.0006-341x.2001.00240.x may be a valid DOI for title: Minimizing model fitting objectives that contain spurious local minima by bootstrap restarting
- 10.2307/2335690 may be a valid DOI for title: Hypothesis testing when a nuisance parameter is present only under the alternative
- 10.1111/j.1467-9574.2012.00530.x may be a valid DOI for title: ‘All models are wrong...’: an introduction to model uncertainty

INVALID DOIs

- None
galessiorob commented 2 years ago

@chasmani an editor in chief will do the final publishing.

Congrats!

kthyng commented 2 years ago

@chasmani Hi, I'm taking over from here. Can you update the metadata in your Zenodo archive so that the title and author list exactly match your JOSS paper?

kthyng commented 2 years ago

@chasmani Regarding your paper I have a few comments: