[REVIEW]: memochange: An R package for estimation procedures and tests for persistent time series

whedon commented 4 years ago

Submitting author: @KaiWenger (Kai Wenger) Repository: https://github.com/KaiWenger/memochange Version: 1.1.0 Editor: @mbobra Reviewer: @JonasMoss, @steffilazerte Archive: 10.5281/zenodo.3543826

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/2a38b620910689bbdb288e11dc810f06"><img src="https://joss.theoj.org/papers/2a38b620910689bbdb288e11dc810f06/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/2a38b620910689bbdb288e11dc810f06/status.svg)](https://joss.theoj.org/papers/2a38b620910689bbdb288e11dc810f06)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@JonasMoss & @steffilazerte, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mbobra know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @JonasMoss

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@KaiWenger) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @steffilazerte

Conflict of interest

[X] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[X] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[X] Repository: Is the source code for this software available at the repository url?
[X] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@KaiWenger) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[X] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @JonasMoss, @steffilazerte it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

mbobra commented 4 years ago

👋 @JonasMoss @steffilazerte Thank you for agreeing to review this submission! Whedon generated a checklist and linked a reviewer guide above -- let me know if you have any questions.

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

steffilazerte commented 4 years ago

Hi @KaiWenger, Hi @Janis410!

I have completed my first round of reviews and have some comments/suggestions for you. To make things clear, I've included a check box where I need feedback or action from you. This R package is solid and and well documented. I found the review straightforward and the examples easy to follow. Thanks!

Authorship

[x] Could you supply me with a brief statement of author contributions? This is not strictly required by JOSS, but I find it helps me evaluate the "Contribution and authorship" requirements

Functionality I installed the CRAN version. I had to install some other libraries first, but this kind of thing is common for my OS (Linux) and was easily fixed. The installation went smoothly and I ran into no problems testing the package.

[x] I have included some suggestions for "good-practices" (KaiWenger/memochange#1)

Documentation I was thoroughly impressed by the amount of documentation that has gone into the functions, especially with respect to referencing different techniques. This is an area that is often skipped or only given lip service and the level of detail here makes the memochange package so much more usable!

[x] I have some suggestions for adding detail to the README (KaiWenger/memochange#2)
[x] I also suggest creating a vignette, adding detail to memochange.R and annotating the examples (KaiWenger/memochange#3)
[x] The package examples are tested by TravisCI, and I suggest adding unit tests and potentially test coverage (KaiWenger/memochange#3)
[x] I think explicit community guidelines would be helpful (KaiWenger/memochange#3)

Software Paper I am a biologist and R programmer familiar with statistical analyses in general, but not with time series analyses nor economics/finance. I was pleasantly surprised to find that while I found the summary complex, it was still mostly understandable (although don't ask me to explain any of the details!). However, it is a complex write up and I'm curious what @JonasMoss thought.

[x] Are there any other packages that do similar things?
[x] While well written, I do have some minor editorial comments:
- Page 1, Start of paragraph 5 - "In recent years, many researchers"
- Page 2, L1 - "adapted to the CUSUM, etc." (?)
- Page 2, L1/2 - "or Wilcoxon tests." (?)
- Page 2, L6/7 - "This may be because Wenger et al. (2019) found that ..."
- Page 2 Statement of Need: put the package name "memochange" in code font (i.e., use the ticks `)
- Page 2, Statement of Need L6/7 : "there is a wide range of possible time series to which the procedures can be applied, eg. ..."
[x] Include link to github repository
[x] Citations in brackets should go in square braces to ensure the correct formatting e.g., [see @robinson1995gaussian, ...] (I think this should remove the double brackets around citations)

Don't hesitate to let me know if anything is unclear or you would like more details.

DISCLAIMER: @mbobra I know nothing of the particular analyses being performed here. My review consists of reviewing whether the package runs into errors, has documentation, examples, and compiles correctly. I am able to reproduce the examples, but am unable to assess whether the results returned are, in fact, statistically sound/correct.

JonasMoss commented 4 years ago

Hi @KaiWenger, @Janis410, and @steffilazerte!

First of all congratulations on a well-designed package! There are a few issues but all in all, this looks like a very solid piece of software.

Note: I'm finished looking at the code but haven't read through the paper proper yet. I'll do that tomorrow and probably post a new issue on your Github then.

I second all of @steffilazerte's suggestions. The most important is the addition of automated tests https://github.com/KaiWenger/memochange/issues/6 (or, at the very least some kind of testing). I also believe your imports have to be fixed, see https://github.com/KaiWenger/memochange/issues/4, and its companion problem of confusing imports in examples https://github.com/KaiWenger/memochange/issues/5.

Most of the documentation issues in https://github.com/KaiWenger/memochange/issues/7 and https://github.com/KaiWenger/memochange/issues/8 are pretty minor and should be easy to fix. I do have some suggestions here I believe would make your package better, but it's fine if you don't follow through on, such as making S3 objects for the return values of e.g CUSUMfixed. These could be modeled after the objects in base R, such as stats::t.test. Such objects come in handy for many purposes, such as for extending your API, for adding generics such as plotting, for helping the user familiar with R, et cetera. Still, I will absolutely respect your choice if you decide not to implement them.

The issues with the high-level documentation of the package are more serious than the issues with the nitty-gritty, at least in terms of attracting users! I think it takes a little too much effort to get a good understanding of the why of your package. It's easy to understand the what of your package; it can be used to test for changes in persistence and means. Still, I have a hard time understanding exactly which function I should pick if I ever want to do a test for a change in persistence, even after reviewing your package. I understand that this is too difficult to address directly in the package docs (I imagine it's an active field of research in its own right), but that is why I suggest you give a short overview of all your functions and compare them in a vignette. It would also be nice to have more \seealso{} in the documentation. This would make it easier to browse the docs for the user. https://github.com/KaiWenger/memochange/issues/9.

As I alluded to, the lack of tests is a major issue. As the package now stands, I don't know if it does what it is supposed to. Ideally, the package would have two kinds of tests:

Automated tests using e.g. testthat,
Tests replicated trusted results from scientific publications.

The first one is most important as it highly increases the trust in the software.

I'll be following both my and @steffilazerte issues. If you have any questions or think I'm just horribly wrong about something, don't hesitate to talk there. Feel free to modify the issues to make e.g. checklists.

steffilazerte commented 4 years ago

Agreed :)

I particularly like the suggestion:

Tests replicated trusted results from scientific publications.

If examples of this sort are included, it will help demonstrate to users that the package does what it is supposed to do. If tests of this sort are included, it will help ensure that future changes don't break any existing functionality.

JonasMoss commented 4 years ago

Hi again @KaiWenger, @Janis410!

Here is my paper review.

Paper Review

I think the paper is both clear and informative. It gives an understandable and short overview of the field of research and makes it easy to comprehend what the package does. It is written for a specialist audience though; it reads like a literature review at the beginning of a statistics article. (I would not able to review its quality as a literature review, as I am no specialist in time series analysis and know nothing about this specific area of time series analysis.)

Take a look at the review criteria, which states that the paper should include:

[A] summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience

Though I highly appreciate your paper (and I absolutely think you shouldn't delete it, you could include it with the package as a vignette for instance) I don't think the paper satisfies the point above.

Some suggestions for writing a paper more friendly to non-specialists is to:

Try to live without some of the references; this is paper should be about "high-level functionality". Your high-level functionality is to check for changes in persistence change in mean under 0 < d < 0.5.
Consider condensing your existing paragraphs considerably: Most of paragraph 2 and paragraph 3 are concerned with details about several testing procedures. Maybe you could say something along the lines of "the package implements the tests of A, B, C, and D." and then be done with it?
Use much more time to explain the problem in the beginning. I think it would be very hard to understand what this package is about for non-specialists.
Perhaps it would help to include a plot of a time series with a change in persistence, such as the one you already have in your readme?
Your "statement of need" reads much less like a literature review and is more in line with what I expect from the "summary" part of a paper in this journal. Maybe you could expand on this to make a new summary?

Other Packages

As @steffilazerte mentions, you should write something about the existence or non-existence of other packages dealing with similar problems and how this package is different from them. I imagine this would include references to packages testing from breakpoints in stationary time series, for instance.

Specific Issues and Comments

These are the comments I have with your summary. Please keep in mind that the paper must be short, so when I say that a point is unclear it might be better to remove it from the paper entirely than trying to explain it.

For an I(d), or long-memory, process with 0 < d < 1, shocks neither die out quickly nor persist infinitely

Put the "0 < d < 1" in front of I(d) or rewrite, i.e if 0 < d < 1 then an I(d) process is a long-memory process. I suggest you drop the terms long-memory, short-memory and difference-stationary as they not used again in the paper and are unlikely to help beginners.

Busetti & Taylor (2004) and Leybourne & Taylor (2004) suggest approaches for testing the null of constant I(0)

You have forgotten to use Latex on I(0).

However, both approaches show serious distortions if neither the null nor the alternative is true, i.e. the series is constant I(1).

This is a surprising "i.e.", as the series could have been I(d) for some d in (0,1): There is no earlier indication in the text that I(0)/I(1) are the only possible alternatives. Consider a rewrite or the use of "e.g.".

Under the null the test assumes constant I(d) behavior with 0 ≤ d < 3/2. The approach suggested by Martins & Rodrigues (2014) is even able to identify changes from −1/2 < d1 < 2 to −1/2 < d2 < 2 with d1 ̸= d2. H

Why would you use or care about other tests then? Does Martins & Rodrigues (2014)'s test have horrible power?

Structural changes cannot only occur in the persistence, but also in the mean of a persistent time series.

I suggest you rewrite this, as it can be read as "structural changes can never only occur in the persistence [...]".

However, these tests for a mean shift are invalidated for d > 0. The problem is that the limiting distributions of the standard tests are different under long memory. Therefore, they reject the null hypothesis of a constant mean with probability of one asymptotically.

Please change "probability of one " to "probability one".

I suggest you remove "invalidated" and write inconsistent instead, as the notion of consistent tests is more well-known and unique than "valid". You could also go with "the tests do not work when d>0" as the summary is intended for a non-specialist audience.

The part with "Therefore, they reject the null hypothesis of a constant mean with probability of one asymptotically." sounds fishy. I understand that limiting distributions can change from context to context, but that does not seem to imply the "therefore" above. For instance, a change from N(0,1) to N(0,1-\epsilon) would not have your claimed property. It's also conceivable the null hypothesis would be accepted with probability 1 in the limit. (I am talking in generality here; i.e all testing problems).

Change-in-mean

A large body of literature handles this topic for weakly dependent I(0) time series.

Could you describe the R-packages made for this problem?

First, whether they are adapted on the CUSUM, sup-Wald, or Wilcoxon test. Second, which type of long-run variance estimator they apply in their test statistics. Early approaches utilize consistent estimates of the long-run variance (Horváth & Kokoszka (1997), Wang (2008), Dehling, Rooch, & Taqqu (2013)), while more recent contributions suggest self-normalized test statistics (Shao (2011), Iacone, Leybourne, & Taylor (2014), Betken (2016), Wenger & Leschinski (2019)). This may be the reason since it is found in Wenger et al. (2019) that self-normalized tests are robust against size distortions.

I believe this paragraph is far too technical for this paper:

First, whether they are adapted on the CUSUM, sup-Wald, or Wilcoxon test.

What are these?

Early approaches utilize consistent estimates of the long-run variance (Horváth & Kokoszka (1997), Wang (2008), Dehling, Rooch, & Taqqu (2013)), while more recent contributions suggest self-normalized test statistics (Shao (2011), Iacone, Leybourne, & Taylor (2014), Betken (2016), Wenger & Leschinski (2019)).

When you write "Early approaches utilize consistent estimates of the long-run variance " does that mean modern approaches use inconsistent estimates? What are self-normalized test statistics?

This may be the reason since it is found in Wenger et al. (2019) that self-normalized tests are robust against size distortions.

What is size distortions?

References

Change "Robinson, P.M." and others to "Robinson, P.M." In "Robinson, P. M., & others. (1995). Gaussian semiparametric estimation of long range dependence."

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

Janis410 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

Janis410 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

KaiWenger commented 4 years ago

Hi @steffilazerte and @JonasMoss,

first of all: thank you very much for your helpful and constructive thoughts and suggestions! We carefully considered all of your suggestions and think that our package improved significantly. We commented on all of your issues in #1, #2, #3, #4, #5, #6, #7, #8, and #9. In summary, we incorporated your remarks as follows:

we included a brief Statement_of_Author_Contributions.md.
we incorporated the suggestions for "good-practices" / better documentation (#1, #4, #5, #7, #8). Among many things, this includes that (a) we call functions directly by using the package::function() notation, (b) we be more precise in explaining the arguments and return values of all functions, (c) we explain the function's examples in more detail, and (d) we changed the DESCRIPTION and .Rbuildignore files according to your suggestions.
as both of you suggested, we included two types of additional tests: First, we include automated testthat tests for all of our functions. We test for the most reasonable combinations of use cases and it is checked whether the implemeted tests produce reasonable results. Second, we reproduce trustable results from published scientific papers in the files /testing/Simulation_Change_in_Mean and /testing/Simulation_Break_in_Persistence.Rmd. Precisely, we reproduce Monte Carlo simulation studies as well as simulations of critical values of the papers that published the tests we implement. We think that both tests increase the trust in our software significantly (see also #3, #6).
we included a CONTRIBUTING.md file to make clear that software contributions are welcome (#3).
we made important changes in the README: we shortened it to be short and on the point, using little code and not much text. Furthermore, for the change-in-mean tests we include a simple real data example (#2, #3, #9).
we created two new vignettes. One is about change-in-persistence tests. The other one is about change-in-mean tests. We put a lot of the content from the former README and paper.md in there, but also explained the functions and tests in much more detail and in many cases in an easier way (#3, #9).
we totally changed the paper.md to explain the functionality and purpose of our package "for a diverse, non-specialist audience" as stated in the review criteria of JOSS. As already mentioned, we use many parts of the former paper.md in our new vignettes. The new software paper is written in a much better way for a non-specialist audience. It further includes a comment on other packages for change in mean and change in persistence. We also elaborate on the statement of need at the end of the paper and shortened the paper. Since the link to our github repository is in the header of the software paper, we did not include a link once again at the bottom of the paper.

We hope that both of you are satisfied with our modifications. We really put a lot of effort the last weeks into the improvement of our package according to your suggestions. We are looking forward hearing from you :)

Janis410 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

JonasMoss commented 4 years ago

@KaiWenger

Your response to our programming comments is great! I left two comments on the Github page that are non-critical.

I really like your new summary paper too. The only issue I can see is that you should cite the strucchange package. Add that and I can check off the last box.

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

JonasMoss commented 4 years ago

@KaiWenger Please use the Journal of Statistical Software entry from https://cran.r-project.org/web/packages/strucchange/citation.html

KaiWenger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

KaiWenger commented 4 years ago

@JonasMoss Thanks for your comments! We included the citation for the strucchange package in our software paper and treated your last two comments on our Github page :)

JonasMoss commented 4 years ago

@KaiWenger Great! Just add the DOI to the paper. =) The DOI is 10.18637/jss.v007.i02

mbobra commented 4 years ago

👋 @KaiWenger could you tick off the boxes that you've completed in https://github.com/openjournals/joss-reviews/issues/1820#issuecomment-547599162? Thanks :)

Janis410 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

steffilazerte commented 4 years ago

Hi @KaiWenger, @Janis410, Wow! So much work in such a short time. Adding the tests, vignettes and annotating the examples makes this a very solid package.

I also like the re-write of the article, it's much more accessible and is very clear about it's purpose.

I have a couple of editorial (and occasionally nit-picky) comments for the article and one comment for the repository:

[x] The text always refers to "mean" then "persistence" (in that order). To maintain parallelism, could the figure be reversed (i.e. mean on the left, persistence on the right) and the example text made to follow the same order? That way the reader always reads them in the same order and it makes it easier to follow
[x] P1 Paragraph 3 "The memochange package is an R (R Core Team, 2019) package that identifies such changes in mean and persistence."
[x] P2 paragraph 2 "A related package is the strucchange package (ref), which also identifies changes in mean. However, the procedures implemented in the memochange package allow for valid inference in a more general setting."

My final suggestion is up to you:

as @JonasMoss suggested in KaiWenger/memochange#9, it would be a good idea to include a description in your GitHub repository (i.e. Replace the text right at the top of the page where it says "No description, website, or topics provided.")

A really great solid package, thanks!

Janis410 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

openjournals / joss-reviews