openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
697 stars 36 forks source link

[REVIEW]: fathon: A Python package for a fast computation of detrendend fluctuation analysis and related algorithms #1828

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @stfbnc (Stefano Bianchi) Repository: https://github.com/stfbnc/fathon.git Version: v0.1.2 Editor: @mbobra Reviewer: @ali-ramadhan, @ashwinvis Archive: 10.5281/zenodo.3601637

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/01daa29253660bd3539ee0f718a52a77"><img src="https://joss.theoj.org/papers/01daa29253660bd3539ee0f718a52a77/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/01daa29253660bd3539ee0f718a52a77/status.svg)](https://joss.theoj.org/papers/01daa29253660bd3539ee0f718a52a77)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ali-ramadhan & @ashwinvis, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mbobra know.

Please try and complete your review in the next two weeks

Review checklist for @ali-ramadhan

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @ashwinvis

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ali-ramadhan, @ashwinvis it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

mbobra commented 4 years ago

👋@ali-ramadhan @ashwinvis Thank you again for volunteering to review. The review instructions are listed above. Please let me know if you have any questions!

ali-ramadhan commented 4 years ago

I like the idea of having a simple and intuitive open-source Python package offering numerous DFA algorithms and the examples look great, but I think fathon needs some work to make it more accessible to potential users.

My major comments are:

  1. fathon is difficult to install. I was actually not able to install it. It could greatly benefit from an automated package management solution.
  2. Currently documentation is provided via a single pdf file. I think generating documentation using a package like Sphinx would provide potential users with searchable documentation that gets updated alongside the code and integrates in the great docstrings that are already in the source code.
  3. Right now the burden is on the user to run manual tests to verify that fathon works, but I think this approach cannot scale as improvements are made and other people decide to contribute. I think an automated testing solution would greatly improve the situation here.

Maybe @mbobra can confirm whether these three points are necessary for publication in JOSS. I do think they would improve fathon and make the package much more accessible to potential users.

A more detailed review follows below.

I plan to come back and finish this review once I can install fathon.

Functionality

Installation

I wonder if providing a Python package installable via pip or conda would make fathon more accessible and easier to install. I'm not super familiar with either but because of the C libraries, presumably conda is the better option?

GSL seems to be a pre-requisite which is available through most Linux package managers. Installing fathon on Windows might prove difficult and it looks like you might need to install all of Cygwin just to get GSL to install fathon. I might mention this as most Windows users have no idea what Cygwin is.

I was not able to install fathon myself. I opened an issue about this: https://github.com/stfbnc/fathon/issues/1

I believe the installation instructions are unfriendly. fathon seems much harder to install for me than the vast majority of Python packages, many of them which depend on C libraries too.

Besides having to manually install or compile GSL, you also need to know where include and library files are placed on your operating system, and you need to know the "standard location of Python packages" for your particular OS and Python environment combination.

I think this is a huge barrier to usability and most Python users won't be able to install fathon.

I opened an issue to ask whether providing a pip or conda package is an option: https://github.com/stfbnc/fathon/issues/2

Functionality

I was not able to install fathon but when I do I will try to confirm this.

I'm not sure if whether four tests on Gaussian white noise are sufficient to claim that fathon fully works. That seems to be the easiest test case. I would like to explore testing other cases.

Performance

No performance claims I could find except for "It is mostly written in Cython and C in order to speed up computations" which seems reasonable.

Documentation

A statement of need

Some motivation is provided in the JOSS paper and documentation.

Installation instructions

There is a clearly-stated list of depdencies (GSL) although Cython might be missing (see https://github.com/stfbnc/fathon/issues/1). I strongly believe that fathon will benefit from having installation handled via an automated package managemnent solution such as pip or conda (see https://github.com/stfbnc/fathon/issues/2).

Examples

The examples directory contains some nice-looking examples that I'm looking forward to running and playing around with. There are also some illustrative examples in the fathon_docs.pdf.

Functionality documentation

I could not find any documentation for the API. I also think having the documentation in PDF form is detrimental to being simple and user friendly as it becomes difficult to link to them and it seems that nobody can contribute to improving the docs as the LaTeX document is not part of the repository.

I believe with documentation packages like Sphinx, documenting the API becomes easy, and it's trivial to deploy the docs to a website like ReadTheDocs.

Looking through the code, there are really good docstrings so it's a shame that they're currently hidden from the user!

Automated tests

There are no automated tests, and no continuous integration.

There is a Jupyter notebook and four Python scripts provided where the burden is on the user to run the tests.

I think it would be easy to convert the four scripts to automated tests that run on a CI platform like Travis CI. I haven't set up Travis with Python but it was painless with other languages, so I assume it should be easy to set up.

Community guidelines

I could not find any community guidelines or a contributor's guide. There should be "clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". This can all be included in the README I believe.

I find it kind of weird that the README states "For bugs or improvements, write to fathon.package@gmail.com" when the package is on GitHub. I think the README should strongly encourage the submission of GitHub issues so that bugs and improvements can be visible to all users and the discussion can be more transparent. This also allows other people to contribute as it's not dependent on one person checking an email inbox and having the time to reply. Keeping track of issues on GitHub also leaves behind a rich history of package development that allows other contributors to take over if the original package developer no longer has the time to develop the package.

Software paper

Summary

I think the first paragraph is not accessible to a "diverse, non-specialist audience". DFA itself isn't actually described. Terms such as "fluctuation function", "Hurst exponent", and "anti-persistent" would not be understood by non-specialists I think.

I also think the mass citation of ~10 different works without context on the second sentence is not helpful to readers, especially non-specialists. I would suggest picking 2-4 works and placing them in context so that a non-specialist reader can understand how DFA was used in that particular work and why it was useful.

Statements such as "found various applications in other fields, like geophysics or economy" seem vague and not very helpful. It can be argued that the entire field of mathematics and statistics has found found various applications in other fields, like geophysics or economy.

The rest of the paper looks good to me.

State of the field

A quick Google search yielded three results for open-source Python packages that provide DFA algorithms:

It doesn't have to enter the README or JOSS paper but I'm curious how fathon is different. Do the others not provide the four algorithms provided by fathon? Are these packages worth mentioning?

A MATLAB package is mentioned in the JOSS paper which I assume is more comprehensive than these Python packages?

Quality of writing

In the first paragraph, I think you mean "economics" instead of "economy".

Otherwise I think it's fine.

References

These seem fine to me, but as mentioned above I think there may be too many.

I think "co_2" and "mauna loa" should be captialized to "CO_2" and "Mauna Loa" for the Varotsos et al. (2007) reference.

Also missing some captialization in the Janosi et al. (1999) reference. Should be "Also I think Statistical analysis of 5 s index data of the Budapest Stock Exchange".

"matlab" should be "Matlab" in the Ihlen (2012) reference.

mbobra commented 4 years ago

👋 @ashwinvis Let me know if you have any questions or need any help completing your review!

ashwinvis commented 4 years ago

@mbobra Working on it :)

mbobra commented 4 years ago

My major comments are:

  1. fathon is difficult to install. I was actually not able to install it. It could greatly benefit from an automated package management solution.

  2. Currently documentation is provided via a single pdf file. I think generating documentation using a package like Sphinx would provide potential users with searchable documentation that gets updated alongside the code and integrates in the great docstrings that are already in the source code.

  3. Right now the burden is on the user to run manual tests to verify that fathon works, but I think this approach cannot scale as improvements are made and other people decide to contribute. I think an automated testing solution would greatly improve the situation here.

Maybe @mbobra can confirm whether these three points are necessary for publication in JOSS. I do think they would improve fathon and make the package much more accessible to potential users.

Thanks for your detailed review, @ali-ramadhan. Here are my replies:

  1. According to the JOSS review guidelines, instructions for installing the software should be clear. An automated package management solution is ideal, but not required. However, the authors must provide clear installation instructions that work as well as a clear list of dependencies.

  2. According to the JOSS review guidelines, the documentation must to include four items: a statement of need, installation instructions, example usage, and API documentation. The first three are pretty clear; the fourth is a bit subjective (and up to the reviewer to decide), but the core API must be documented in a reasonable way.

  3. According to JOSS' review guidelines, the reviewer must be able to check to see if the software actually works. An automated test suite is ideal, but documented manual steps are okay too. But there must be some way for the reviewer to objectively assess whether the software works.

Does this help, @ali-ramadhan?

ali-ramadhan commented 4 years ago

Thank you @mbobra, that's helpful! Sorry it's my first review but I should have familiarized myself with the review guidelines more closely.

  1. According to the JOSS review guidelines, instructions for installing the software should be clear. An automated package management solution is ideal, but not required. However, the authors must provide clear installation instructions that work as well as a clear list of dependencies.

I would be happy with clear installation instructions that can be followed by people without having to spend too much time figuring out where GSL include/library files are located and where Python packages are usually installed.

  1. According to the JOSS review guidelines, the documentation must to include four items: a statement of need, installation instructions, example usage, and API documentation. The first three are pretty clear; the fourth is a bit subjective (and up to the reviewer to decide), but the core API must be documented in a reasonable way.

I think fathon satisfies the first three and does technically satisfy the fourth as functions in the source code are documented with good docstrings. I feel like it doesn't count as API documentation though unless the docstrings are compiled into documentation, but maybe having docstrings in the source code meets the JOSS requirements.

  1. According to JOSS' review guidelines, the reviewer must be able to check to see if the software actually works. An automated test suite is ideal, but documented manual steps are okay too. But there must be some way for the reviewer to objectively assess whether the software works.

The manual steps in the README sound good then.

mbobra commented 4 years ago

@ali-ramadhan Sounds good! Re: The 1st point - Thanks for opening an issue about the installation instructions.
Re: The 2nd point - Feel free to open an issue about the core API documentation. Re: The 3rd point - It seems like this is no longer an issue.

@stfbnc Let me know if you have any questions or need any help resolving these issues.

stfbnc commented 4 years ago

@ali-ramadhan Thanks for the comments, I'm now working on resolving the issues and on the documentation.

@mbobra Can I modify the paper as ali-ramadhan suggested, or should I wait for the second reviewer's comments?

mbobra commented 4 years ago

@stfbnc Yes, feel free to modify the paper right away! Every time you modify, you can make a comment in this thread with the command @whedon generate pdf to ask the JOSS bot Whedon to compile a PDF (so that we can all see the latest draft). This is an iterative process, so there is absolutely no issue with modifying and re-generating the paper as many times as you like.

stfbnc commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

stfbnc commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

stfbnc commented 4 years ago

@ali-ramadhan I have corrected the paper as you suggested, there is now a theoretical introduction on the algorithms covered by fathon.

For the State of the field section:

mbobra commented 4 years ago

👋 @ashwinvis I'm trying to move this along for the authors. Let me know if you want any help in completing your review! I'm genuinely happy to help.

stfbnc commented 4 years ago

@ali-ramadhan @ashwinvis Sorry for the late reply. I have updated the repository. Requirements are now specified, along with the new installation guidelines. I have also added:

stfbnc commented 4 years ago

Code coverage is by now 0%, but only because the package is written in cython and not python. I'm still trying to figure out how to make codecov consider also .pyx or .so files.

ashwinvis commented 4 years ago

@mbobra Thanks for the nudge. I was coordinating with the author to improve the installation and packaging situation https://github.com/stfbnc/fathon/issues/1#issuecomment-552462767 so that we have a version that can be reviewed.

@stfbnc Code coverage is not necessary for the paper, but if you are willing to go that extra mile, it can be fixed by following this document. You need to enable CYTHON_TRACE=1 and Cython.Coverage plugin when you run tests / CI. Do not enable linetrace by default, as it can degrade performance.

mbobra commented 4 years ago

@stfbnc Let me know your timeline/plans for this paper. You should feel absolutely free to take as much time as you need, so there's no rush or anything. I only want to know the timeline because if it is long (>1 month or so) we can add a "paused" label to this review to indicate to the Editors in Chief that they don't need to watch over this thread in the interim.

mbobra commented 4 years ago

/ooo December 16 until January 6

ooo[bot] commented 4 years ago

:+1: Marked @mbobra as OOO from Monday, December 16th 2019 to Thursday, January 17th 2019. :calendar:

stfbnc commented 4 years ago

@mbobra Actually I already have improved the installation procedure and I have added to the package what the reviewers pointed out was missing. I was waiting for @ali-ramadhan and @ashwinvis comments on the improved version of the package.

mbobra commented 4 years ago

Ah! Thank you for the clarification!

@ali-ramadhan and @ashwinvis, do @stfbnc's improvements pass your review?

ali-ramadhan commented 4 years ago

I'm very sorry for the slow reply on my part.

Thanks @stfbnc for making fathon available through pip. I was able to install it, run the examples, and the tests! Also great work on moving the documentation to Sphinx.

Also thank you @stfbnc for clarifying the difference between fathon and the other packages and revising the software paper. Just gave it another read and it's much better. The paper was helpful in understanding the fathon examples as I'm not super familiar with DFA.

The last paragraph in your software paper reads more like an introduction. I wonder if it would be better to move that paragraph to the very start of the paper (not necessary of course, just a suggestion).

I've updated the review checklist.

My only remaining concern is regarding the tests. Perhaps my ignorance of DFA is showing but I'm actually not sure how fathon is being tested. Take the first test for example:

ts1 = np.loadtxt('./tests/ts1.txt')
ts1 = fathon.toAggregated(ts1)

def test_dfa():
    pydfa = fathon.DFA(ts1)
    n1, F1 = pydfa.computeFlucVec(10, nMax=200, revSeg=True)
    H1, H_int1 = pydfa.fitFlucVec()

    assert math.isclose(H1, 0.7982194289592676)
  1. What is the data contained in ts1.txt? It looks like random numbers but maybe it's real data? Is it e.g. normally distributed?
  2. What is being tested? I assume the test is computing a Hurst exponent H1 and the test is that H1 comes out correct?
  3. Where did the magic number 0.7982194289592676 come from? Is it common to compute Hurst exponents to 16 significant figures? Why do you expect this exact value for H1? Does it come from some analytic expression or theory? Or is this value produced by fathon itself, in which case is this a regression test?

I will open an issue about this for clarification.

stfbnc commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

stfbnc commented 4 years ago

@ali-ramadhan

The last paragraph in your software paper reads more like an introduction. I wonder if it would be better to move that paragraph to the very start of the paper (not necessary of course, just a suggestion).

I have moved the first sentence at the beginning of the paper, and moved the remainder at the end of the theory. The paragraph contains terms that are described in the theory section, so I think is better to first have the theory.

My only remaining concern is regarding the tests.

I have answered your questions in the issue.

ali-ramadhan commented 4 years ago

Thanks for the clarification and improvements to the tests @stfbnc!

As all the tests are regression tests however, I am wondering whether that is enough to "cover the core functionality of the software" as per the review criteria.

Regression tests by definition only test that the software's behavior has not changed but by themselves do not test that the software produces the correct answer.

I wonder if it would be possible to add tests that cover the core functionality of the software. For example, if you load in a time series of an uncorrelated process (Brownian motion?) do all the methods produce a Hurst exponent of H ≈ 0.5 as expected?

I'd be curious to hear what @mbobra and @ashwinvis's thoughts are on this matter.

ashwinvis commented 4 years ago

I agree with @ali-ramadhan. I was trying to reproduce an existing tutorial made with another DFT implementation with fathon which gives approx. 0.54 as the Hurst exponent. I haven't finished it and if needed, I can make work-in-progress PR as an example.

ashwinvis commented 4 years ago

Comments

The Python package fathon appears to be a novel and efficient implementation of an analytical tool that was missing from the ecosystem. I congratulate the authorfor the contribution. As of now, the paper gives a good introduction to different algorithms associated with DFA and its variants.

Major revision:

Minor revision:

Feel free to ask if any clarifications are needed.

stfbnc commented 4 years ago

@ashwinvis

There is only a single mention describing the "State of the field" with "commonly-used packages". This can be improved with more references, links or description.

I have added some links and described the differences with fathon

Reproduce a third-party test case to verify that fathon implements the right algorithm. I have tried to attempt this myself in stfbnc/fathon/pull/5.

I have added tests to reproduce the same results obtained in the Ihlen's paper "Introduction to Multifractal Detrended Fluctuation Analysis in Matlab" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3366552/)

Mention why GSL is a dependency, which I believe is used for fitting polynomials. Maybe also elaborate why this was preferred over numpy.polyfit.

I have added a sentence in the README (Prerequisites -> External libraries section)

Contributing guidelines (CONTRIBUTING.md) refers to tags "proposed feature", "new feature" etc. Are these supposed to be plain text / GitHub labels? If it is the latter, I did not find them.

I have added the missing labels

Update link to PyPI release for "Archive" in the manuscript (right now it shows "PENDING")

How can I update the link?

The names of the associated module by providing links to the documentation

Do you mean adding module's name and link in the paper where the specific algorithm is described?

ashwinvis commented 4 years ago

How can I update the link?

I am not sure either. Presumably in the manuscript's metadata that your wrote? The editors should know.

Do you mean adding module's name and link in the paper where the specific algorithm is described?

Yes. Something like: fathon.DFA.

mbobra commented 4 years ago

How can I update the link?

I am not sure either. Presumably in the manuscript's metadata that your wrote? The editors should know.

Once this paper passes review, the submitting author will archive their release on Zenodo to obtain a DOI. The "Archive" link will then lead to this Zenodo entry.

ali-ramadhan commented 4 years ago

Thanks for adding the six functionality tests @stfbnc. The tests are now automated on Travis CI and the tests cover functionality via functionality and regression tests, which is awesome.

Just ticked off the last item on my checklist so I think fathon should be accepted into JOSS.

mbobra commented 4 years ago

@ashwinvis I see four items left on your checklist, can you please let me know if those are still issues? Thanks!

ashwinvis commented 4 years ago

Thanks for clarification. I am satisfied with the latest version of the manuscript and I recommend the paper to be accepted, @mbobra! Cheers @stfbnc!

stfbnc commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

stfbnc commented 4 years ago

@ali-ramadhan @ashwinvis thank you very much for reviewing fathon, and for all the valuable suggestions!

I've recompiled the paper, there were a few typos and now there are also the links to the modules' documentation (as @ashwinvis suggested).

mbobra commented 4 years ago

@ali-ramadhan @ashwinvis Thank you sincerely for all of the time and effort you dedicated to this review. I truly appreciate that you took it so seriously and gave @stfbnc tons of great feedback that greatly improved the submission. Thank you so much for volunteering to review for JOSS ☀️

@stfbnc Congratulations 🎉 Thank you for working so hard on this submission and integrating all the feedback from the reviewers. I know it was a ton of work and I hope you are happy with the final outcome. We're almost there -- can you please archive your release in Zenodo, making sure the Zenodo entry has the same title and author list as the paper, and put the resultant Zenodo DOI in your README.md file? Thank you!

stfbnc commented 4 years ago

@mbobra Thank you! I'm really happy with the final outcome!

Here is the Zenodo DOI: https://zenodo.org/badge/latestdoi/214290119 It is also included in the readme.

mbobra commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

mbobra commented 4 years ago

@whedon check references