[REVIEW]: CycloPhaser: A Python Package for Detecting Extratropical Cyclone Life Cycles

openjournals / joss-reviews

Reviews for the Journal of Open Source Software

Creative Commons Zero v1.0 Universal

725 stars 38 forks source link

[REVIEW]: CycloPhaser: A Python Package for Detecting Extratropical Cyclone Life Cycles #7363

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@daniloceano@observingClouds<!--end-editor-- Reviewers: @freemansw1, @stella-bourdin Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0"><img src="https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0/status.svg)](https://joss.theoj.org/papers/fc6ccda843aaedbf2d813473b7a561e0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@freemansw1 & @stella-bourdin, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @observingClouds know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @freemansw1

📝 Checklist for @stella-bourdin

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (1233.6 files/s, 124131.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          10            322            268            991
Markdown                         3            121              0            195
TeX                              1             19              0            171
reStructuredText                 7             95             84            143
YAML                             3             32             13            142
CSV                              1              0              0             66
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            27            601            373           1743
-------------------------------------------------------------------------------

Commit count by author:

    67  daniloceano
    18  Danilo Couto de Souza

editorialbot commented 1 month ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1175/1520-0493(1922)50<468:JBAHSO>2.0.CO;2 is OK
- 10.1007/978-1-944970-33-8_10 is OK
- 10.1175/1520-0493(1993)121<2153:tlcoae>2.0.co;2 is OK
- 10.1175/bams-d-16-0261.1 is OK
- 10.1175/mwr3420.1 is OK
- 10.1175/2008mwr2491.1 is OK
- 10.1175/jas-d-13-0267.1 is OK
- 10.1029/2018gl078977 is OK
- 10.1175/jcli-d-16-0697.1 is OK
- 10.1002/joc.8539 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1007/s00382-019-04778-1 is OK
- 10.21203/rs.3.rs-995499/v1 is OK
- 10.1029/2022ea002482 is OK
- 10.3354/cr01651 is OK
- 10.1007/s11069-024-06621-1 is OK
- 10.1007/s00382-005-0065-9 is OK

🟡 SKIP DOIs

- None

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 921

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟡 License found: GNU General Public License v3.0 (Check here for OSI approval)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 1 month ago

👋🏼 @daniloceano @freemansw1 @stella-bourdin this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#7363 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@observingClouds ) if you have any questions/concerns.

freemansw1 commented 1 month ago

Review checklist for @freemansw1

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/daniloceano/CycloPhaser?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@daniloceano) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[ ] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1. Contribute to the software 2. Report issues or problems with the software 3. Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

stella-bourdin commented 1 month ago

Review checklist for @stella-bourdin

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/daniloceano/CycloPhaser?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@daniloceano) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1. Contribute to the software 2. Report issues or problems with the software 3. Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

stella-bourdin commented 1 month ago

Hi, Thank you for your submission. I have gone through the paper and ran the documented test. Here are my comments on some points of the review checklists, besides functionality issues raised as issues on the repo itself (links above).

General checks

Authorship

Only the first author appear in the commits. Please make sure that authorship follows journal guidelines. I am aware these can include author that did not directly committed into the code, and that you are responsible for the choice of final author list, so this is just a reminder.

Documentation

Statement of need

Procedure overview page is great. I would suggest making a similar one for the statement of need (which I supposed can be a copy of the statement of need in the paper).

Example provided

Besides functionality issues, here are some more comments regarding the example:

A brief introduction to what is done in this example would be appreciated. In particular, what is the data that is loaded. Help us understand what we are doing here.
For the output figure, please precise what the different zeta variables correspond to
In the legend, the residual is green like decay instead of grey.
API
Must the series argument in determine_period be in a specific unit? In any case, I recommend specifying whether or not in the documentation of the function.

Paper

Summary, statement of need and state of the field

Very clear and well written

Features

I suggest adding a first paragraph about what are the input requirements to give context. Current paragraph 1 lacks clarity, but it would be helped by adding a first paragraph as suggested above. Current paragraph 2: I suggest citing the paper from the top of the paragraph. e.g. "Thresholds for phase detection were rigorously calibrated in Couto de Souza et al. (2024) using a representative set of cyclone tracks, ensuring accurate phase identification while filtering out noise." Current paragraph 3: It is not clear whether the user can actually plug in an SLP or wind time series and it would work just as well, or whether it would require additional adaptations.

I will next go through a more thorough testing of the code functionalities, and test other data. Stella

daniloceano commented 1 month ago

@stella-bourdin, thank you for your thorough review. Below, I will address your comments and the steps I have taken to address them:

Authorship

Comment: Only the first author appears in the commits. Please make sure that authorship follows journal guidelines. I am aware that these can include authors that did not directly commit to the code, so this is just a reminder.
Response: I appreciate the reminder. I was the only one working directly on the code, but the other researchers involved contributed significantly to the conceptualization of the algorithm, provided datasets, and collaborated with me in checking the outputs. Their contributions were essential for ensuring that the package produces realistic and useful results.

Documentation

Statement of Need

Comment: The procedure overview page is great. I would suggest making a similar one for the statement of need (which can be a copy from the paper).
Response: I have now added a Statement of Need section in the documentation, following the suggestion.

Example Provided

Comment: A brief introduction to what is done in the example would be appreciated. In particular, explain what data is loaded and clarify what we are doing here.
Response: A brief introduction explaining the dataset used and the purpose of the example has been added to the Usage section of the documentation.
Comment: For the output figure, please clarify what the different zeta variables correspond to.
Response: I simplified the output by removing the extra zeta variables in the figure for better clarity.
Comment: In the legend, the residual phase is green like decay instead of grey.
Response: The color for the residual phase has been updated to grey as originally intended.

I am currently working on the remaining issues you raised in the repository and will continue to address them one by one. Once all the adjustments are complete, I will notify you and provide the updated code and paper.

Thank you again for your invaluable feedback. I look forward to continuing to improve the paper and the package based on your suggestions.

Best regards,
Danilo

daniloceano commented 1 month ago

@stella-bourdin following with your appointments:

API:

_"Must the series argument in determine_period be in a specific unit? In any case, I recommend specifying whether or not in the documentation of the function."_

I have clarified in the documentation that the series argument does not need to be in a specific unit. While the function was designed for vorticity data, it should theoretically work with other meteorological fields like sea level pressure (SLP) or wind. However, this hasn't been fully tested yet. This update has been made both in the API documentation and the Usage Guide to make it clear to users.

daniloceano commented 1 month ago

@stella-bourdin

Features:

I have added a first paragraph to the features section describing the input requirements to provide more context for the user.
For paragraph 2, I followed your suggestion and cited Couto de Souza et al. (2024) at the beginning of the paragraph to clarify that thresholds for phase detection were calibrated based on a representative set of cyclone tracks.
For paragraph 3, I clarified that, although the package was initially developed for vorticity, it could potentially be applied to other variables, such as SLP or wind data. However, I have noted that it has not been explicitly tested for these variables, and the input data currently needs to be negative.

Thank you again for your careful review and valuable suggestions. I look forward to your further feedback as I address these points.

Best regards,
Danilo

stella-bourdin commented 1 month ago

Hi,

Thank you for going through this quickly! I am now able to install the program and run the example smoothly. Some more comments as I continue to test the package (and look for every way to break it - sorry).

Paper

I am not sure where I can find the updated of the paper (the link above seem to correspond to the previous version). Should it not appear on this thread? (@observingClouds ?)

Documentation

Although minor, I feel like adding one example plot for a canonical case on the GitHub and readthedocs first page would greatly help the users directly visualizing what the package does.

Tests

It seems to me that the test only checks that the program runs without error. It could be helpful to also check that the output is what was expected, in case a change does not break the program, but affects the output.

`determine_periods` functionality and documentation

These are thoughts on improving the package that include more or less essential changes. Feel free to discard those you do not wish to implement.

series: It would make the package more easy to use if it could accept more flexible inputs. I'm thinking numpy arrays in particular, but also pandas series or xarray datarrays as the user is I think most likely to have loaded its data with pandas or xarray. However, the error message if the type of input is not right is quite clear, so it is up to you.
x : It is not exactly clear what type this can be and what is the expected behaviour based on this. It seems like I can provide it with numpy or pandas datetime with similar behaviour. I am not sure what it is doing when I provide a list of ints.
Pre-processing options: It is not clear what is the units of these arguments? Is it in number of time steps? In hours? It would help to provide some recommendations on how to tune these parameters best to our data (should we tune it to the frequency, for example?)
- use_smoothing:
  - an error is raised if it is below savgol_polynomial (I think), it would be good to precise that it must be greater.
  - Doc says it can be boolean, but False and True raise errors (because it reads them as 0 or 1 which are below savgol_polynomial = 3. Is it actually possible to deactivate?
- use_filter : I seem to obtain the same result whether I activate it or not, is it normal? Why does the yellow line not fit the grey one after I removed the filter? Is there still some kind of normalization?
Hemisphere support: I would argue in favor of making this an argument. It is as simple as changing the sign of the input, but as user it makes it more friendly. That would help for using NH data, but also if one want to use wind speed data where apex is necessarily the maxima in both hemisphere.

Maybe a test could be added in case the pre-processing parameters choice leads to the apparition of "spurious oscillation" at the beginning and end of the series, making the user aware he might need to adapt the parameters. I suppose this could be done by checking the delta between the first and second point compared to the order of magnitude of the input values.

Testing with other data

Here are some observations as I apply the package to a set of my tropical cyclones tracks I have lying around. I find it slightly confusing why sometimes the first part is incipient, and other times when it is not obvious to me they display different behaviours.

In this case, there is a residual in the middle of the time series. Is it normal?

In this case, there is a gap with nothing identified in the middle. Is it normal?

Sometimes the identified mature stage seems offset compared to the actual series. In the example below, the red part is well below the obvious vorticity minimum. How can I fix this?

Using wind data instead of vorticity, I seem able to get satisfactory results. When using sea-level pressure data, it gets weirder. I think it is because there is some normalization going on at some point, making it more difficult to read the plots.

observingClouds commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 1 month ago

@stella-bourdin I just generated the latest version of the manuscript (see above). We do not do this automatically, but one can generate the latest version by kindly asking our bot 😄

daniloceano commented 4 weeks ago

Hi @stella-bourdin

Thank you so much for your thorough review and feedback! I'm glad to hear that the installation and example run smoothly now, and I appreciate the detailed testing and suggestions. This level of detail will certainly help make the package better, and I’m very grateful for your efforts.

Documentation

We agree that a visual example could greatly enhance usability, and we've now added an example plot to both the GitHub README and the ReadTheDocs front page.

Tests

Your suggestion to verify output accuracy is very helpful. We’ve updated the tests to compare the function’s output against expected results in specific cases, ensuring both functionality and data integrity.

`determine_periods` Functionality and Documentation

We appreciate your detailed feedback on determine_periods. Here’s how we’ve addressed each of these points:

Flexible Input Types for series: series now accepts a wider range of input types, including list, numpy.ndarray, pandas.Series, and xarray.DataArray. The function converts these types to a compatible format internally, making the tool more versatile without sacrificing clarity.
Clarifying x Input: The x parameter has been documented to specify that it accepts a list or pandas.DatetimeIndex with timestamps. We have also clarified the behavior when using integer values in x. This input is crucial for setting the periods based on time intervals, so we’ve elaborated on this in the documentation.
Pre-Processing Parameter Units and Recommendations: Each pre-processing option now specifies time step units. We’ve also added guidance for tuning these parameters based on data frequency, with recommendations to help users adjust values like cutoff_low and cutoff_high depending on whether their data is daily, hourly, or otherwise.
use_smoothing Error Handling and Documentation: You were correct; the smoothing window length must be greater than or equal to savgol_polynomial. We’ve added validation to check this, plus an option to deactivate smoothing by setting use_smoothing=False. This detail is now clear in both the documentation and error messages.
use_filter Clarifications: To activate use_filter meaningfully, you should pass an integer value for the filter window length rather than using True. Setting use_filter=True (I am assuming you did that) without a specified window length defaults the window length to 1, which effectively disables filtering. The yellow line in the plots is normalized to provide a clearer visualization of the filtered and unfiltered series in relation to each other. This normalization, applied both to the filtered and the smoothed series, is specifically for visualization purposes and helps to highlight phase transitions more distinctly.
Hemisphere Support: To support data from both hemispheres and various data types (e.g., wind speed, SLP), we’ve added a hemisphere argument. Setting this to "northern" automatically inverts the input, while "southern" (the default) maintains the original convention.
Detecting Spurious Oscillations: We implemented a check to detect potential spurious oscillations at the series boundaries, which might indicate that adjustments are needed for certain pre-processing parameters. If detected, the function issues a warning with specific recommendations for tuning the inputs.

Testing with other data

Incipient Stage Confusion: The package relies on duration thresholds for identifying each phase. Differences in detection can result from variations in input data meeting criteria thresholds, particularly for detecting an initial decay or incipient stage. Using the plot_steps argument can help visualize each detection stage step-by-step, providing insights into why certain phases are or aren’t detected in specific instances.

Unexpected Residual Stage or Gaps in the Series: For both of these cases, could you provide details about the options used? These patterns aren’t typical or expected behavior. However, they could relate to challenges in input data quality, especially if there are tracking inconsistencies. For example, in the first case, the tracking algorithm may have terminated the cyclone track prematurely, impacting the ability to fully detect the mature stage. Similarly, the second case might reflect a vorticity series with irregularities, possibly due to the influence of spatial filtering or weak systems.

If you’re using TRACK data, it’s worth noting that TRACK sometimes has limitations in capturing consistent cyclone characteristics, leading to possible anomalies. Without supplementary data—such as spatial fields like SLP—detailed analysis is limited, but data quality issues can occasionally propagate into CycloPhaser’s outputs, amplifying detection inconsistencies.

Offset in the Identified Mature Stage: The offset issue in the mature stage may relate to the smoothing and polynomial settings used. By reducing the level of smoothing or increasing the savgol_polynomial value, you can achieve a better alignment of detected vorticity minima with the original series. The filtering process smooths the raw data (yellow line), and adjusting these parameters can mitigate shifts and better capture phase transitions.

Using Wind or Sea-Level Pressure Data: If you’re achieving satisfactory results with wind data, that’s a good sign. Maybe you could now apply the hemisphere argument?

Can you share a time series plot of the raw SLP data for reference? The only normalization applied occurs in the plotting stage to facilitate visualization, which might influence readability but does not alter the analysis itself.

Thank you again for the constructive feedback, which has truly improved the package. Let us know if you encounter any further issues or have additional suggestions!

Best regards,
Danilo

stella-bourdin commented 3 weeks ago

Thank you very much for going through these! Here are additional remarks and replies

Package dependencies

There was something else I forgot to mention, which is that the package installation requires very specific package version for the dependencies. This is likely to create conflict with other packages installed in one's environment. I advise requiring specific versions only when absolutely necessary for cyclophaser to work. You can also request package version >= or <= to a given version is there is a functionality that appeared or disappeared at some point. This flexibility will make the package more easy to use. Otherwise, it needs to be installed in its standalone environment where it cannot be used in conjunction with other packages.

Replies re. functionality and doc

Thank you for carefully going through these and implementing what I suggested.

Regarding x : I am checking the function docstring, the API documentation and the usage guide but cannot find information about the behaviour when integers are provided.
Regarding pre-processing recommendations: Can you point me to where you provided these recommendations?
use_smoothing : The docstring is better, but does not say that one may deactivate it altogether by setting False.
Regarding the spurious oscillations detector: That is great ! However, it seems triggered by situations where I do not understand why, e.g. for the following test:

Testing with other data

Unexpected Residual Stage or Gaps in the Series: I used the default options (determine_periods(list(-1 * tracks[i].relative_vorticity.values), x = list(tracks[i].time.values), plot="test_default",)), but I agree that the one with residual in the middle is a particularly abnormal time series. The one with a gap however, I would say that by eye I can identify two mature periods. But I suppose this is a matter of tuning. Maybe it could be useful to just add a warning in such cases (A gap in the identified periods, suspicious succession of period...), so that the user is aware of potential data quality problems? As one may not be checking every individual time series when applying to a full dataset, it would warn them that caution is required.
Using wind data: I did try to apply the hemisphere argument and it works well indeed.
Using SLP data: The second SLP time series was indeed very noisy, hence the result. Input data quality was the problem (garbage in, garbage out...)

Paper

Seems very good to me now. You might want to update the very last sentence about the hemispheres now that you implemented the option.

Code

I had a look at the code itself. It is clean and well documented throughout. You might want to transform some of your testing in the end of each module file into proper tests (some of which might already be). Automated tests do not need to be only for the main functionality but can also be for intermediary steps. But it is up to you. There is also a chunk of commented imports in determine_period.py that you might want to clean out. But I'm being really picky here...

daniloceano commented 3 weeks ago

Thank you for the feedback!

I’ve updated the requirements.txt file to increase flexibility in dependency versions, specifying only the minimum versions necessary for cyclophaser to function effectively. Testing compatibility with every specific version of each dependency, however, would be impractical due to the extensive range of versions involved. To ensure stability, core dependencies are tested across recent major versions in our CI setup.

Regarding x: Thank you for pointing that out! I've updated the documentation to clarify the behavior when x is provided as a list of integers. In these cases, the function detects phases between the series' time steps rather than actual dates or times, which can be especially useful for data without precise timestamps or relative time steps. Apologies for missing this detail in my previous responses, and thanks again for helping ensure the documentation is as clear as possible!

Regarding the pre-processing recommendations, these are detailed in the documentation for each parameter. Here it is:

use_filter: (str or int, optional) Apply a Lanczos filter to the series. Specify a window length as an integer to customize or use 'auto' for adaptive length based on dataset size (half of series length). Units: Time steps. Default: 'auto'.
Recommendation: If using relative vorticity series, turn off use_filter if the tracking procedure already applies spatial filtering to avoid over-filtering and signal loss. For hourly ERA5 data, 'auto' is typically effective, though this may need adjustment for different temporal resolutions and spatial resolutions. Use smoothing also if noise levels are too high.
replace_endpoints_with_lowpass: (int, optional) Applies a lowpass filter to smooth the series endpoints, which helps stabilize edge effects during filtering. Specify the window length. Units: Time steps. Default: 24.
Recommendation: For hourly relative vorticity data, a 24-hour (24 time steps) setting is effective. Adjust this based on the temporal and spatial resolution of the original data, especially if using data with higher spatial resolution.
use_smoothing: (str, int, optional) Apply Savgol smoothing to filtered vorticity data. Set to 'auto' to automatically choose an appropriate window length, or provide an integer window length directly. Note: The specified window length must be an odd number and greater than or equal to savgol_polynomial. Set use_smoothing=False to deactivate. Units: Time steps. Default: 'auto'.
Recommendation: This setting is sensitive to the length of the time series. The 'auto' setting uses a window length approximately 1/4 of the series length for series >8 days; otherwise, it uses about 1/2. For lower-noise data, this value can be decreased, and for higher-noise data, increase it accordingly.
use_smoothing_twice: (str, int, optional) Apply a second pass of Savgol smoothing for further noise reduction. This uses similar parameters to use_smoothing. Default: 'auto'.
Recommendation: This should be a gentler smoothing than the initial use_smoothing. The 'auto' setting applies half the window length used in the first pass.
savgol_polynomial: (int, optional) Polynomial order for Savgol smoothing, which must be less than or equal to the window length specified in use_smoothing and use_smoothing_twice. Default: 3.
Recommendation: Higher values retain sharper peaks and more detailed features but can increase noise; lower values provide a smoother output that may oversmooth finer details. For noisier data, a lower polynomial value is preferable, and for cleaner data, a higher value helps preserve more details.
cutoff_low: (float, optional) Low-frequency cutoff for the Lanczos filter, designed for data with hourly resolution. Units: Time steps. Default: 168.
Recommendation: Set this to the equivalent of 7 days in time steps to filter out planetary wave influences on vorticity.
cutoff_high: (float, optional) High-frequency cutoff for the Lanczos filter, suitable for reducing high-frequency noise in hourly data. Units: Time steps. Default: 48.
Recommendation: Set this to the equivalent of 2 days in time steps to effectively filter out mesoscale influences on vorticity.

Regarding the spurious oscillations detector: This might be happening because the vorticity delta between the first time steps is much higher than for the rest of the series. This can trigger the detector, but in these cases, the warning can generally be ignored as it’s a conservative check.

In response to the issues with residual stages or gaps in the series, we’ve added warnings that detect and alert users if residual periods appear in the middle of a time series or if there are unclassified time steps (gaps). These warnings indicate potential data quality concerns and suggest adjusting pre-processing options, which may help resolve such issues. This update should aid in identifying and addressing unusual behaviors automatically when processing larger datasets without needing to review each individual time series manually.

Thanks for confirming that the hemisphere adjustment works well for wind data, and I appreciate the additional feedback on the SLP data. As you noted, input quality does indeed play a big role,

Regarding the code review, thank you for the kind words! I agree with your observation about testing; however, I find that focusing on testing the main functionality, especially for determine_periods, provides clearer debugging and maintenance benefits in this case (at least for me). I've also cleaned up the commented imports in determine_periods.py as suggested.

We’ve updated the documentation in the paper to reflect the new hemisphere option and clarified its intended use cases, particularly for detecting cyclone phases in both vorticity and alternative series like SLP and wind speed.

Thank you once again for the detailed review—it’s made a big difference in refining the package.

stella-bourdin commented 2 weeks ago

I just caught this: There is a typo in the title of the documentation: It is written CyloPhaser with a missing "c".

stella-bourdin commented 2 weeks ago

Regarding the pre-processing recommendations, I do not see them neither in the API doc here, nor in the code docstring here. Am I missing something?

daniloceano commented 2 weeks ago

Hi @stella-bourdin,

Thanks for catching that typo! I’ve fixed the "CycloPhaser" typo in the documentation.

Regarding the pre-processing recommendations, they’re actually located in the Usage Guide rather than the API doc. I wanted to keep the API documentation as straightforward as possible, while the Usage Guide offers a more thorough explanation of how to customize filtering.

Let me know if you have any more questions!

freemansw1 commented 2 weeks ago

Apologies for my delay in getting started, I've just come off of some travel. Initial thoughts are below before I try to break the package using a variety of other data.

General Checks

Functionality

Installation

I wasn't able to install requirements.txt using conda-forge, but I was able to install through pip. Not worth changing, just wanted to note. Some of the libraries aren't available through conda-forge/conda/mamba.

Functionality

The example works as described, and the test passes.

Documentation

Statement of Need

After the previous reviewer, the authors include a statement of need in the documentation. I think this largely meets the mark, but the homepage of the documentation is not as helpful from this end as I think it can be. I think a reference to this paper or the one in JOC would be useful here. It also needed to be clarified to me how or where one would get a vorticity timeseries, although this becomes clearer the further into the documentation that you go.

I think, ultimately, the best way to resolve my concern here would be to elaborate much more on the homepage about what this package provides an average user. Make it clear that this package is the first that allows translation from tracked ET cyclone vorticity (or potentially other) information to lifecycle stage.

Installation Instructions

I would have liked instructions on how to install the latest and/or in development package. This is an easy fix. Note also my comment about conda/mamba above. The scientific software world seems split between the two at the moment.

Example usage

The only example uses pre-packaged data from the package itself. I think showing what this data looks like (or how to generate it by oneself) would be useful. The note on data frequency is useful, but not clear enough. I would have assumed, before reading into the documentation deeply, that if I provided a pd.Series object indexed by a DatetimeIndex, the package would automatically calculate the appropriate values, etc.

I think making it clear that one needs to have the vorticity already from a tracking package (either the one already provided by the author or from a different package) in the examples or on the homepage is a necessary change. That was not immediately clear to me.

Functionality documentation

I would have liked to have seen the other publicly accessible functions documented with docstrings.

Software paper

I think the paper is in overall good shape.

daniloceano commented 2 weeks ago

Thank you very much for your comprehensive review and insights. We've made several updates based on your feedback:

Installation and Installation Instructions

We've expanded the installation instructions, adding detailed steps for setting up the latest package. The instructions now address setup in various environments, including options for those using pip or a virtual conda environment.

Statement of Need

To address the need for clarity on the package's purpose and the source of input data, we have revised both the README and the documentation homepage. These updates emphasize that CycloPhaser does not generate vorticity series and outline how users can obtain these series, either through available tracking packages or from published databases. As it was no longer needed, we removed the Statement of Need documentation section.

We've also added citations for both the paper currently under review and the published study in the JOC to provide users with contextual references.

Example Usage

To further clarify the input data requirements, we’ve added an example track file format in the documentation. This example demonstrates the structure expected for time, latitude, and longitude columns in the input data.

Regarding your suggestion on data frequency, you’re correct that the package automatically handles pd.Series objects indexed by a DatetimeIndex (as stated in the usage guide and API documentation). The x argument is only required when using a list or array, as the DatetimeIndex serves as the time reference for pd.Series inputs.

Functionality Documentation

Additional docstrings have been added for other accessible functions to improve usability and help developers integrate or modify parts of CycloPhaser more easily.

Thank you again!

stella-bourdin commented 1 week ago

Hi, @observingClouds, that is all good for me now! @daniloceano, thank you for your patience while I attempted to break your code and for you thorough and timely answers.

observingClouds commented 1 week ago

@stella-bourdin, @freemansw1 thank you very much for your very thorough and helpful review. @freemansw1 please let us know if there are still open discussion points from your point of you. If this is no longer the case please tick-off all items in your review checklist. Thank you!

daniloceano commented 1 week ago

Thank you very much @stella-bourdin !

openjournals / joss-reviews

[REVIEW]: CycloPhaser: A Python Package for Detecting Extratropical Cyclone Life Cycles #7363

Status

Reviewer instructions & questions

Checklists

Review checklist for @freemansw1

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @stella-bourdin

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

General checks

Authorship

Documentation

Statement of need

Example provided

API

Paper

Summary, statement of need and state of the field

Features

Authorship

Documentation

Statement of Need

Example Provided

API:

Features:

Paper

Documentation

Tests

determine_periods functionality and documentation

Testing with other data

Documentation

Tests

determine_periods Functionality and Documentation

Testing with other data

Package dependencies

Replies re. functionality and doc

Testing with other data

Paper

Code

General Checks

Functionality

Installation

Functionality

Documentation

Statement of Need

Installation Instructions

Example usage

Functionality documentation

Software paper

Installation and Installation Instructions

Statement of Need

Example Usage

Functionality Documentation

`determine_periods` functionality and documentation

`determine_periods` Functionality and Documentation