openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: CycloPhaser: A Python Package for Detecting Extratropical Cyclone Life Cycles #7363

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@daniloceano<!--end-author-handle-- (Danilo Couto de Souza) Repository: https://github.com/daniloceano/CycloPhaser Branch with paper.md (empty if default branch): joss-submission Version: 1.8.1 Editor: !--editor-->@observingClouds<!--end-editor-- Reviewers: @freemansw1, @stella-bourdin Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0"><img src="https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0/status.svg)](https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@freemansw1 & @stella-bourdin, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @observingClouds know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @freemansw1

📝 Checklist for @stella-bourdin

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (1233.6 files/s, 124131.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          10            322            268            991
Markdown                         3            121              0            195
TeX                              1             19              0            171
reStructuredText                 7             95             84            143
YAML                             3             32             13            142
CSV                              1              0              0             66
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            27            601            373           1743
-------------------------------------------------------------------------------

Commit count by author:

    67  daniloceano
    18  Danilo Couto de Souza
editorialbot commented 1 month ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1175/1520-0493(1922)50<468:JBAHSO>2.0.CO;2 is OK
- 10.1007/978-1-944970-33-8_10 is OK
- 10.1175/1520-0493(1993)121<2153:tlcoae>2.0.co;2 is OK
- 10.1175/bams-d-16-0261.1 is OK
- 10.1175/mwr3420.1 is OK
- 10.1175/2008mwr2491.1 is OK
- 10.1175/jas-d-13-0267.1 is OK
- 10.1029/2018gl078977 is OK
- 10.1175/jcli-d-16-0697.1 is OK
- 10.1002/joc.8539 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1007/s00382-019-04778-1 is OK
- 10.21203/rs.3.rs-995499/v1 is OK
- 10.1029/2022ea002482 is OK
- 10.3354/cr01651 is OK
- 10.1007/s11069-024-06621-1 is OK
- 10.1007/s00382-005-0065-9 is OK

🟡 SKIP DOIs

- None

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None
editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 921

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟡 License found: GNU General Public License v3.0 (Check here for OSI approval)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 1 month ago

👋🏼 @daniloceano @freemansw1 @stella-bourdin this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#7363 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@observingClouds ) if you have any questions/concerns.

freemansw1 commented 1 month ago

Review checklist for @freemansw1

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

stella-bourdin commented 1 month ago

Review checklist for @stella-bourdin

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

stella-bourdin commented 1 month ago

Hi, Thank you for your submission. I have gone through the paper and ran the documented test. Here are my comments on some points of the review checklists, besides functionality issues raised as issues on the repo itself (links above).

General checks

Authorship

Only the first author appear in the commits. Please make sure that authorship follows journal guidelines. I am aware these can include author that did not directly committed into the code, and that you are responsible for the choice of final author list, so this is just a reminder.

Documentation

Statement of need

Procedure overview page is great. I would suggest making a similar one for the statement of need (which I supposed can be a copy of the statement of need in the paper).

Example provided

Besides functionality issues, here are some more comments regarding the example:

Paper

Summary, statement of need and state of the field

Very clear and well written

Features

I suggest adding a first paragraph about what are the input requirements to give context. Current paragraph 1 lacks clarity, but it would be helped by adding a first paragraph as suggested above. Current paragraph 2: I suggest citing the paper from the top of the paragraph. e.g. "Thresholds for phase detection were rigorously calibrated in Couto de Souza et al. (2024) using a representative set of cyclone tracks, ensuring accurate phase identification while filtering out noise." Current paragraph 3: It is not clear whether the user can actually plug in an SLP or wind time series and it would work just as well, or whether it would require additional adaptations.

I will next go through a more thorough testing of the code functionalities, and test other data. Stella

daniloceano commented 1 month ago

@stella-bourdin, thank you for your thorough review. Below, I will address your comments and the steps I have taken to address them:


Authorship


Documentation

Statement of Need

Example Provided


I am currently working on the remaining issues you raised in the repository and will continue to address them one by one. Once all the adjustments are complete, I will notify you and provide the updated code and paper.

Thank you again for your invaluable feedback. I look forward to continuing to improve the paper and the package based on your suggestions.

Best regards,
Danilo

daniloceano commented 1 month ago

@stella-bourdin following with your appointments:

API:

_"Must the series argument in determine_period be in a specific unit? In any case, I recommend specifying whether or not in the documentation of the function."_

I have clarified in the documentation that the series argument does not need to be in a specific unit. While the function was designed for vorticity data, it should theoretically work with other meteorological fields like sea level pressure (SLP) or wind. However, this hasn't been fully tested yet. This update has been made both in the API documentation and the Usage Guide to make it clear to users.

daniloceano commented 1 month ago

@stella-bourdin

Features:

Thank you again for your careful review and valuable suggestions. I look forward to your further feedback as I address these points.

Best regards,
Danilo

stella-bourdin commented 1 month ago

Hi,

Thank you for going through this quickly! I am now able to install the program and run the example smoothly. Some more comments as I continue to test the package (and look for every way to break it - sorry).

Paper

I am not sure where I can find the updated of the paper (the link above seem to correspond to the previous version). Should it not appear on this thread? (@observingClouds ?)

Documentation

Although minor, I feel like adding one example plot for a canonical case on the GitHub and readthedocs first page would greatly help the users directly visualizing what the package does.

Tests

It seems to me that the test only checks that the program runs without error. It could be helpful to also check that the output is what was expected, in case a change does not break the program, but affects the output.

determine_periods functionality and documentation

These are thoughts on improving the package that include more or less essential changes. Feel free to discard those you do not wish to implement.

Maybe a test could be added in case the pre-processing parameters choice leads to the apparition of "spurious oscillation" at the beginning and end of the series, making the user aware he might need to adapt the parameters. I suppose this could be done by checking the delta between the first and second point compared to the order of magnitude of the input values.

Testing with other data

Here are some observations as I apply the package to a set of my tropical cyclones tracks I have lying around. I find it slightly confusing why sometimes the first part is incipient, and other times when it is not obvious to me they display different behaviours.

image

In this case, there is a residual in the middle of the time series. Is it normal?

image

In this case, there is a gap with nothing identified in the middle. Is it normal?

image

Sometimes the identified mature stage seems offset compared to the actual series. In the example below, the red part is well below the obvious vorticity minimum. How can I fix this?

image

Using wind data instead of vorticity, I seem able to get satisfactory results. When using sea-level pressure data, it gets weirder. I think it is because there is some normalization going on at some point, making it more difficult to read the plots.

image
observingClouds commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 1 month ago

@stella-bourdin I just generated the latest version of the manuscript (see above). We do not do this automatically, but one can generate the latest version by kindly asking our bot 😄

daniloceano commented 4 weeks ago

Hi @stella-bourdin

Thank you so much for your thorough review and feedback! I'm glad to hear that the installation and example run smoothly now, and I appreciate the detailed testing and suggestions. This level of detail will certainly help make the package better, and I’m very grateful for your efforts.


Documentation

We agree that a visual example could greatly enhance usability, and we've now added an example plot to both the GitHub README and the ReadTheDocs front page.


Tests

Your suggestion to verify output accuracy is very helpful. We’ve updated the tests to compare the function’s output against expected results in specific cases, ensuring both functionality and data integrity.


determine_periods Functionality and Documentation

We appreciate your detailed feedback on determine_periods. Here’s how we’ve addressed each of these points:


Testing with other data

Incipient Stage Confusion: The package relies on duration thresholds for identifying each phase. Differences in detection can result from variations in input data meeting criteria thresholds, particularly for detecting an initial decay or incipient stage. Using the plot_steps argument can help visualize each detection stage step-by-step, providing insights into why certain phases are or aren’t detected in specific instances.

Unexpected Residual Stage or Gaps in the Series: For both of these cases, could you provide details about the options used? These patterns aren’t typical or expected behavior. However, they could relate to challenges in input data quality, especially if there are tracking inconsistencies. For example, in the first case, the tracking algorithm may have terminated the cyclone track prematurely, impacting the ability to fully detect the mature stage. Similarly, the second case might reflect a vorticity series with irregularities, possibly due to the influence of spatial filtering or weak systems.

If you’re using TRACK data, it’s worth noting that TRACK sometimes has limitations in capturing consistent cyclone characteristics, leading to possible anomalies. Without supplementary data—such as spatial fields like SLP—detailed analysis is limited, but data quality issues can occasionally propagate into CycloPhaser’s outputs, amplifying detection inconsistencies.

Offset in the Identified Mature Stage: The offset issue in the mature stage may relate to the smoothing and polynomial settings used. By reducing the level of smoothing or increasing the savgol_polynomial value, you can achieve a better alignment of detected vorticity minima with the original series. The filtering process smooths the raw data (yellow line), and adjusting these parameters can mitigate shifts and better capture phase transitions.

Using Wind or Sea-Level Pressure Data: If you’re achieving satisfactory results with wind data, that’s a good sign. Maybe you could now apply the hemisphere argument?

Can you share a time series plot of the raw SLP data for reference? The only normalization applied occurs in the plotting stage to facilitate visualization, which might influence readability but does not alter the analysis itself.


Thank you again for the constructive feedback, which has truly improved the package. Let us know if you encounter any further issues or have additional suggestions!

Best regards,
Danilo

stella-bourdin commented 3 weeks ago

Thank you very much for going through these! Here are additional remarks and replies

Package dependencies

  1. There was something else I forgot to mention, which is that the package installation requires very specific package version for the dependencies. This is likely to create conflict with other packages installed in one's environment. I advise requiring specific versions only when absolutely necessary for cyclophaser to work. You can also request package version >= or <= to a given version is there is a functionality that appeared or disappeared at some point. This flexibility will make the package more easy to use. Otherwise, it needs to be installed in its standalone environment where it cannot be used in conjunction with other packages.

Replies re. functionality and doc

Thank you for carefully going through these and implementing what I suggested.

  1. Regarding x : I am checking the function docstring, the API documentation and the usage guide but cannot find information about the behaviour when integers are provided.
  2. Regarding pre-processing recommendations: Can you point me to where you provided these recommendations?
  3. use_smoothing : The docstring is better, but does not say that one may deactivate it altogether by setting False.
  4. Regarding the spurious oscillations detector: That is great ! However, it seems triggered by situations where I do not understand why, e.g. for the following test: image

Testing with other data

  1. Unexpected Residual Stage or Gaps in the Series: I used the default options (determine_periods(list(-1 * tracks[i].relative_vorticity.values), x = list(tracks[i].time.values), plot="test_default",)), but I agree that the one with residual in the middle is a particularly abnormal time series. The one with a gap however, I would say that by eye I can identify two mature periods. But I suppose this is a matter of tuning. Maybe it could be useful to just add a warning in such cases (A gap in the identified periods, suspicious succession of period...), so that the user is aware of potential data quality problems? As one may not be checking every individual time series when applying to a full dataset, it would warn them that caution is required.
  2. Using wind data: I did try to apply the hemisphere argument and it works well indeed.
  3. Using SLP data: The second SLP time series was indeed very noisy, hence the result. Input data quality was the problem (garbage in, garbage out...)

Paper

Seems very good to me now. You might want to update the very last sentence about the hemispheres now that you implemented the option.

Code

I had a look at the code itself. It is clean and well documented throughout. You might want to transform some of your testing in the end of each module file into proper tests (some of which might already be). Automated tests do not need to be only for the main functionality but can also be for intermediary steps. But it is up to you. There is also a chunk of commented imports in determine_period.py that you might want to clean out. But I'm being really picky here...

daniloceano commented 3 weeks ago

Thank you for the feedback!

I’ve updated the requirements.txt file to increase flexibility in dependency versions, specifying only the minimum versions necessary for cyclophaser to function effectively. Testing compatibility with every specific version of each dependency, however, would be impractical due to the extensive range of versions involved. To ensure stability, core dependencies are tested across recent major versions in our CI setup.


Regarding x: Thank you for pointing that out! I've updated the documentation to clarify the behavior when x is provided as a list of integers. In these cases, the function detects phases between the series' time steps rather than actual dates or times, which can be especially useful for data without precise timestamps or relative time steps. Apologies for missing this detail in my previous responses, and thanks again for helping ensure the documentation is as clear as possible!


Regarding the pre-processing recommendations, these are detailed in the documentation for each parameter. Here it is:


Regarding the spurious oscillations detector: This might be happening because the vorticity delta between the first time steps is much higher than for the rest of the series. This can trigger the detector, but in these cases, the warning can generally be ignored as it’s a conservative check.


In response to the issues with residual stages or gaps in the series, we’ve added warnings that detect and alert users if residual periods appear in the middle of a time series or if there are unclassified time steps (gaps). These warnings indicate potential data quality concerns and suggest adjusting pre-processing options, which may help resolve such issues. This update should aid in identifying and addressing unusual behaviors automatically when processing larger datasets without needing to review each individual time series manually.


Thanks for confirming that the hemisphere adjustment works well for wind data, and I appreciate the additional feedback on the SLP data. As you noted, input quality does indeed play a big role,

Regarding the code review, thank you for the kind words! I agree with your observation about testing; however, I find that focusing on testing the main functionality, especially for determine_periods, provides clearer debugging and maintenance benefits in this case (at least for me). I've also cleaned up the commented imports in determine_periods.py as suggested.


We’ve updated the documentation in the paper to reflect the new hemisphere option and clarified its intended use cases, particularly for detecting cyclone phases in both vorticity and alternative series like SLP and wind speed.


Thank you once again for the detailed review—it’s made a big difference in refining the package.

stella-bourdin commented 2 weeks ago

I just caught this: There is a typo in the title of the documentation: It is written CyloPhaser with a missing "c".

stella-bourdin commented 2 weeks ago

Regarding the pre-processing recommendations, I do not see them neither in the API doc here, nor in the code docstring here. Am I missing something?

daniloceano commented 2 weeks ago

Hi @stella-bourdin,

Thanks for catching that typo! I’ve fixed the "CycloPhaser" typo in the documentation.

Regarding the pre-processing recommendations, they’re actually located in the Usage Guide rather than the API doc. I wanted to keep the API documentation as straightforward as possible, while the Usage Guide offers a more thorough explanation of how to customize filtering.

Let me know if you have any more questions!

freemansw1 commented 2 weeks ago

Apologies for my delay in getting started, I've just come off of some travel. Initial thoughts are below before I try to break the package using a variety of other data.

General Checks

Functionality

Installation

I wasn't able to install requirements.txt using conda-forge, but I was able to install through pip. Not worth changing, just wanted to note. Some of the libraries aren't available through conda-forge/conda/mamba.

Functionality

The example works as described, and the test passes.

Documentation

Statement of Need

After the previous reviewer, the authors include a statement of need in the documentation. I think this largely meets the mark, but the homepage of the documentation is not as helpful from this end as I think it can be. I think a reference to this paper or the one in JOC would be useful here. It also needed to be clarified to me how or where one would get a vorticity timeseries, although this becomes clearer the further into the documentation that you go.

I think, ultimately, the best way to resolve my concern here would be to elaborate much more on the homepage about what this package provides an average user. Make it clear that this package is the first that allows translation from tracked ET cyclone vorticity (or potentially other) information to lifecycle stage.

Installation Instructions

I would have liked instructions on how to install the latest and/or in development package. This is an easy fix. Note also my comment about conda/mamba above. The scientific software world seems split between the two at the moment.

Example usage

The only example uses pre-packaged data from the package itself. I think showing what this data looks like (or how to generate it by oneself) would be useful. The note on data frequency is useful, but not clear enough. I would have assumed, before reading into the documentation deeply, that if I provided a pd.Series object indexed by a DatetimeIndex, the package would automatically calculate the appropriate values, etc.

I think making it clear that one needs to have the vorticity already from a tracking package (either the one already provided by the author or from a different package) in the examples or on the homepage is a necessary change. That was not immediately clear to me.

Functionality documentation

I would have liked to have seen the other publicly accessible functions documented with docstrings.

Software paper

I think the paper is in overall good shape.

daniloceano commented 2 weeks ago

Thank you very much for your comprehensive review and insights. We've made several updates based on your feedback:

Installation and Installation Instructions

We've expanded the installation instructions, adding detailed steps for setting up the latest package. The instructions now address setup in various environments, including options for those using pip or a virtual conda environment.

Statement of Need

To address the need for clarity on the package's purpose and the source of input data, we have revised both the README and the documentation homepage. These updates emphasize that CycloPhaser does not generate vorticity series and outline how users can obtain these series, either through available tracking packages or from published databases. As it was no longer needed, we removed the Statement of Need documentation section.

We've also added citations for both the paper currently under review and the published study in the JOC to provide users with contextual references.

Example Usage

To further clarify the input data requirements, we’ve added an example track file format in the documentation. This example demonstrates the structure expected for time, latitude, and longitude columns in the input data.

Regarding your suggestion on data frequency, you’re correct that the package automatically handles pd.Series objects indexed by a DatetimeIndex (as stated in the usage guide and API documentation). The x argument is only required when using a list or array, as the DatetimeIndex serves as the time reference for pd.Series inputs.

Functionality Documentation

Additional docstrings have been added for other accessible functions to improve usability and help developers integrate or modify parts of CycloPhaser more easily.

Thank you again!

stella-bourdin commented 1 week ago

Hi, @observingClouds, that is all good for me now! @daniloceano, thank you for your patience while I attempted to break your code and for you thorough and timely answers.

observingClouds commented 1 week ago

@stella-bourdin, @freemansw1 thank you very much for your very thorough and helpful review. @freemansw1 please let us know if there are still open discussion points from your point of you. If this is no longer the case please tick-off all items in your review checklist. Thank you!

daniloceano commented 1 week ago

Thank you very much @stella-bourdin !