Open editorialbot opened 1 month ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.90 T=0.02 s (1233.6 files/s, 124131.8 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 10 322 268 991
Markdown 3 121 0 195
TeX 1 19 0 171
reStructuredText 7 95 84 143
YAML 3 32 13 142
CSV 1 0 0 66
DOS Batch 1 8 1 26
make 1 4 7 9
-------------------------------------------------------------------------------
SUM: 27 601 373 1743
-------------------------------------------------------------------------------
Commit count by author:
67 daniloceano
18 Danilo Couto de Souza
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
✅ OK DOIs
- 10.1175/1520-0493(1922)50<468:JBAHSO>2.0.CO;2 is OK
- 10.1007/978-1-944970-33-8_10 is OK
- 10.1175/1520-0493(1993)121<2153:tlcoae>2.0.co;2 is OK
- 10.1175/bams-d-16-0261.1 is OK
- 10.1175/mwr3420.1 is OK
- 10.1175/2008mwr2491.1 is OK
- 10.1175/jas-d-13-0267.1 is OK
- 10.1029/2018gl078977 is OK
- 10.1175/jcli-d-16-0697.1 is OK
- 10.1002/joc.8539 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1007/s00382-019-04778-1 is OK
- 10.21203/rs.3.rs-995499/v1 is OK
- 10.1029/2022ea002482 is OK
- 10.3354/cr01651 is OK
- 10.1007/s11069-024-06621-1 is OK
- 10.1007/s00382-005-0065-9 is OK
🟡 SKIP DOIs
- None
❌ MISSING DOIs
- None
❌ INVALID DOIs
- None
Paper file info:
📄 Wordcount for paper.md
is 921
✅ The paper includes a Statement of need
section
License info:
🟡 License found: GNU General Public License v3.0
(Check here for OSI approval)
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
👋🏼 @daniloceano @freemansw1 @stella-bourdin this is the review thread for the paper. All of our communications will happen here from now on.
As a reviewer, the first step is to create a checklist for your review by entering
@editorialbot generate my checklist
as the top of a new comment in this thread.
These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#7363
so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.
Please feel free to ping me (@observingClouds ) if you have any questions/concerns.
Hi, Thank you for your submission. I have gone through the paper and ran the documented test. Here are my comments on some points of the review checklists, besides functionality issues raised as issues on the repo itself (links above).
Only the first author appear in the commits. Please make sure that authorship follows journal guidelines. I am aware these can include author that did not directly committed into the code, and that you are responsible for the choice of final author list, so this is just a reminder.
Procedure overview page is great. I would suggest making a similar one for the statement of need (which I supposed can be a copy of the statement of need in the paper).
Besides functionality issues, here are some more comments regarding the example:
series
argument in determine_period
be in a specific unit? In any case, I recommend specifying whether or not in the documentation of the function. Very clear and well written
I suggest adding a first paragraph about what are the input requirements to give context. Current paragraph 1 lacks clarity, but it would be helped by adding a first paragraph as suggested above. Current paragraph 2: I suggest citing the paper from the top of the paragraph. e.g. "Thresholds for phase detection were rigorously calibrated in Couto de Souza et al. (2024) using a representative set of cyclone tracks, ensuring accurate phase identification while filtering out noise." Current paragraph 3: It is not clear whether the user can actually plug in an SLP or wind time series and it would work just as well, or whether it would require additional adaptations.
I will next go through a more thorough testing of the code functionalities, and test other data. Stella
@stella-bourdin, thank you for your thorough review. Below, I will address your comments and the steps I have taken to address them:
Comment: Only the first author appears in the commits. Please make sure that authorship follows journal guidelines. I am aware that these can include authors that did not directly commit to the code, so this is just a reminder.
Response: I appreciate the reminder. I was the only one working directly on the code, but the other researchers involved contributed significantly to the conceptualization of the algorithm, provided datasets, and collaborated with me in checking the outputs. Their contributions were essential for ensuring that the package produces realistic and useful results.
Comment: The procedure overview page is great. I would suggest making a similar one for the statement of need (which can be a copy from the paper).
Response: I have now added a Statement of Need section in the documentation, following the suggestion.
Comment: A brief introduction to what is done in the example would be appreciated. In particular, explain what data is loaded and clarify what we are doing here.
Response: A brief introduction explaining the dataset used and the purpose of the example has been added to the Usage section of the documentation.
Comment: For the output figure, please clarify what the different zeta
variables correspond to.
Response: I simplified the output by removing the extra zeta
variables in the figure for better clarity.
Comment: In the legend, the residual phase is green like decay instead of grey.
Response: The color for the residual phase has been updated to grey as originally intended.
I am currently working on the remaining issues you raised in the repository and will continue to address them one by one. Once all the adjustments are complete, I will notify you and provide the updated code and paper.
Thank you again for your invaluable feedback. I look forward to continuing to improve the paper and the package based on your suggestions.
Best regards,
Danilo
@stella-bourdin following with your appointments:
_"Must the series argument in
determine_period
be in a specific unit? In any case, I recommend specifying whether or not in the documentation of the function."_
I have clarified in the documentation that the series
argument does not need to be in a specific unit. While the function was designed for vorticity data, it should theoretically work with other meteorological fields like sea level pressure (SLP) or wind. However, this hasn't been fully tested yet. This update has been made both in the API documentation and the Usage Guide to make it clear to users.
@stella-bourdin
Thank you again for your careful review and valuable suggestions. I look forward to your further feedback as I address these points.
Best regards,
Danilo
Hi,
Thank you for going through this quickly! I am now able to install the program and run the example smoothly. Some more comments as I continue to test the package (and look for every way to break it - sorry).
I am not sure where I can find the updated of the paper (the link above seem to correspond to the previous version). Should it not appear on this thread? (@observingClouds ?)
Although minor, I feel like adding one example plot for a canonical case on the GitHub and readthedocs first page would greatly help the users directly visualizing what the package does.
It seems to me that the test only checks that the program runs without error. It could be helpful to also check that the output is what was expected, in case a change does not break the program, but affects the output.
determine_periods
functionality and documentationThese are thoughts on improving the package that include more or less essential changes. Feel free to discard those you do not wish to implement.
series
: It would make the package more easy to use if it could accept more flexible inputs. I'm thinking numpy arrays in particular, but also pandas series or xarray datarrays as the user is I think most likely to have loaded its data with pandas or xarray. However, the error message if the type of input is not right is quite clear, so it is up to you. x
: It is not exactly clear what type this can be and what is the expected behaviour based on this. It seems like I can provide it with numpy or pandas datetime with similar behaviour. I am not sure what it is doing when I provide a list of ints. use_smoothing
:
savgol_polynomial
(I think), it would be good to precise that it must be greater. savgol_polynomial = 3
. Is it actually possible to deactivate?use_filter
: I seem to obtain the same result whether I activate it or not, is it normal? Why does the yellow line not fit the grey one after I removed the filter? Is there still some kind of normalization? Maybe a test could be added in case the pre-processing parameters choice leads to the apparition of "spurious oscillation" at the beginning and end of the series, making the user aware he might need to adapt the parameters. I suppose this could be done by checking the delta between the first and second point compared to the order of magnitude of the input values.
Here are some observations as I apply the package to a set of my tropical cyclones tracks I have lying around. I find it slightly confusing why sometimes the first part is incipient, and other times when it is not obvious to me they display different behaviours.
In this case, there is a residual in the middle of the time series. Is it normal?
In this case, there is a gap with nothing identified in the middle. Is it normal?
Sometimes the identified mature stage seems offset compared to the actual series. In the example below, the red part is well below the obvious vorticity minimum. How can I fix this?
Using wind data instead of vorticity, I seem able to get satisfactory results. When using sea-level pressure data, it gets weirder. I think it is because there is some normalization going on at some point, making it more difficult to read the plots.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@stella-bourdin I just generated the latest version of the manuscript (see above). We do not do this automatically, but one can generate the latest version by kindly asking our bot 😄
Hi @stella-bourdin
Thank you so much for your thorough review and feedback! I'm glad to hear that the installation and example run smoothly now, and I appreciate the detailed testing and suggestions. This level of detail will certainly help make the package better, and I’m very grateful for your efforts.
We agree that a visual example could greatly enhance usability, and we've now added an example plot to both the GitHub README and the ReadTheDocs front page.
Your suggestion to verify output accuracy is very helpful. We’ve updated the tests to compare the function’s output against expected results in specific cases, ensuring both functionality and data integrity.
determine_periods
Functionality and DocumentationWe appreciate your detailed feedback on determine_periods
. Here’s how we’ve addressed each of these points:
Flexible Input Types for series
: series
now accepts a wider range of input types, including list
, numpy.ndarray
, pandas.Series
, and xarray.DataArray
. The function converts these types to a compatible format internally, making the tool more versatile without sacrificing clarity.
Clarifying x
Input: The x
parameter has been documented to specify that it accepts a list
or pandas.DatetimeIndex
with timestamps. We have also clarified the behavior when using integer values in x
. This input is crucial for setting the periods based on time intervals, so we’ve elaborated on this in the documentation.
Pre-Processing Parameter Units and Recommendations: Each pre-processing option now specifies time step units. We’ve also added guidance for tuning these parameters based on data frequency, with recommendations to help users adjust values like cutoff_low
and cutoff_high
depending on whether their data is daily, hourly, or otherwise.
use_smoothing
Error Handling and Documentation: You were correct; the smoothing window length must be greater than or equal to savgol_polynomial
. We’ve added validation to check this, plus an option to deactivate smoothing by setting use_smoothing=False
. This detail is now clear in both the documentation and error messages.
use_filter
Clarifications: To activate use_filter
meaningfully, you should pass an integer value for the filter window length rather than using True
. Setting use_filter=True
(I am assuming you did that) without a specified window length defaults the window length to 1, which effectively disables filtering. The yellow line in the plots is normalized to provide a clearer visualization of the filtered and unfiltered series in relation to each other. This normalization, applied both to the filtered and the smoothed series, is specifically for visualization purposes and helps to highlight phase transitions more distinctly.
Hemisphere Support: To support data from both hemispheres and various data types (e.g., wind speed, SLP), we’ve added a hemisphere
argument. Setting this to "northern" automatically inverts the input, while "southern" (the default) maintains the original convention.
Detecting Spurious Oscillations: We implemented a check to detect potential spurious oscillations at the series boundaries, which might indicate that adjustments are needed for certain pre-processing parameters. If detected, the function issues a warning with specific recommendations for tuning the inputs.
Incipient Stage Confusion:
The package relies on duration thresholds for identifying each phase. Differences in detection can result from variations in input data meeting criteria thresholds, particularly for detecting an initial decay or incipient stage. Using the plot_steps
argument can help visualize each detection stage step-by-step, providing insights into why certain phases are or aren’t detected in specific instances.
Unexpected Residual Stage or Gaps in the Series: For both of these cases, could you provide details about the options used? These patterns aren’t typical or expected behavior. However, they could relate to challenges in input data quality, especially if there are tracking inconsistencies. For example, in the first case, the tracking algorithm may have terminated the cyclone track prematurely, impacting the ability to fully detect the mature stage. Similarly, the second case might reflect a vorticity series with irregularities, possibly due to the influence of spatial filtering or weak systems.
If you’re using TRACK data, it’s worth noting that TRACK sometimes has limitations in capturing consistent cyclone characteristics, leading to possible anomalies. Without supplementary data—such as spatial fields like SLP—detailed analysis is limited, but data quality issues can occasionally propagate into CycloPhaser’s outputs, amplifying detection inconsistencies.
Offset in the Identified Mature Stage:
The offset issue in the mature stage may relate to the smoothing and polynomial settings used. By reducing the level of smoothing or increasing the savgol_polynomial
value, you can achieve a better alignment of detected vorticity minima with the original series. The filtering process smooths the raw data (yellow line), and adjusting these parameters can mitigate shifts and better capture phase transitions.
Using Wind or Sea-Level Pressure Data:
If you’re achieving satisfactory results with wind data, that’s a good sign. Maybe you could now apply the hemisphere
argument?
Can you share a time series plot of the raw SLP data for reference? The only normalization applied occurs in the plotting stage to facilitate visualization, which might influence readability but does not alter the analysis itself.
Thank you again for the constructive feedback, which has truly improved the package. Let us know if you encounter any further issues or have additional suggestions!
Best regards,
Danilo
Thank you very much for going through these! Here are additional remarks and replies
Thank you for carefully going through these and implementing what I suggested.
x
: I am checking the function docstring, the API documentation and the usage guide but cannot find information about the behaviour when integers are provided. use_smoothing
: The docstring is better, but does not say that one may deactivate it altogether by setting False.determine_periods(list(-1 * tracks[i].relative_vorticity.values), x = list(tracks[i].time.values), plot="test_default",)
), but I agree that the one with residual in the middle is a particularly abnormal time series. The one with a gap however, I would say that by eye I can identify two mature periods. But I suppose this is a matter of tuning. Maybe it could be useful to just add a warning in such cases (A gap in the identified periods, suspicious succession of period...), so that the user is aware of potential data quality problems? As one may not be checking every individual time series when applying to a full dataset, it would warn them that caution is required. Seems very good to me now. You might want to update the very last sentence about the hemispheres now that you implemented the option.
I had a look at the code itself. It is clean and well documented throughout.
You might want to transform some of your testing in the end of each module file into proper tests (some of which might already be). Automated tests do not need to be only for the main functionality but can also be for intermediary steps. But it is up to you.
There is also a chunk of commented imports in determine_period.py
that you might want to clean out. But I'm being really picky here...
Thank you for the feedback!
I’ve updated the requirements.txt file to increase flexibility in dependency versions, specifying only the minimum versions necessary for cyclophaser to function effectively. Testing compatibility with every specific version of each dependency, however, would be impractical due to the extensive range of versions involved. To ensure stability, core dependencies are tested across recent major versions in our CI setup.
Regarding x
: Thank you for pointing that out! I've updated the documentation to clarify the behavior when x
is provided as a list of integers. In these cases, the function detects phases between the series' time steps rather than actual dates or times, which can be especially useful for data without precise timestamps or relative time steps. Apologies for missing this detail in my previous responses, and thanks again for helping ensure the documentation is as clear as possible!
Regarding the pre-processing recommendations, these are detailed in the documentation for each parameter. Here it is:
use_filter: (str or int, optional) Apply a Lanczos filter to the series
. Specify a window length as an integer to customize or use 'auto'
for adaptive length based on dataset size (half of series length). Units: Time steps. Default: 'auto'.
Recommendation: If using relative vorticity series, turn off use_filter
if the tracking procedure already applies spatial filtering to avoid over-filtering and signal loss. For hourly ERA5 data, 'auto'
is typically effective, though this may need adjustment for different temporal resolutions and spatial resolutions. Use smoothing also if noise levels are too high.
replace_endpoints_with_lowpass: (int, optional) Applies a lowpass filter to smooth the series endpoints, which helps stabilize edge effects during filtering. Specify the window length. Units: Time steps. Default: 24.
Recommendation: For hourly relative vorticity data, a 24-hour (24 time steps) setting is effective. Adjust this based on the temporal and spatial resolution of the original data, especially if using data with higher spatial resolution.
use_smoothing: (str, int, optional) Apply Savgol smoothing to filtered vorticity data. Set to 'auto'
to automatically choose an appropriate window length, or provide an integer window length directly. Note: The specified window length must be an odd number and greater than or equal to savgol_polynomial
. Set use_smoothing=False
to deactivate. Units: Time steps. Default: 'auto'.
Recommendation: This setting is sensitive to the length of the time series. The 'auto'
setting uses a window length approximately 1/4 of the series length for series >8 days; otherwise, it uses about 1/2. For lower-noise data, this value can be decreased, and for higher-noise data, increase it accordingly.
use_smoothing_twice: (str, int, optional) Apply a second pass of Savgol smoothing for further noise reduction. This uses similar parameters to use_smoothing
. Default: 'auto'.
Recommendation: This should be a gentler smoothing than the initial use_smoothing
. The 'auto'
setting applies half the window length used in the first pass.
savgol_polynomial: (int, optional) Polynomial order for Savgol smoothing, which must be less than or equal to the window length specified in use_smoothing
and use_smoothing_twice
. Default: 3.
Recommendation: Higher values retain sharper peaks and more detailed features but can increase noise; lower values provide a smoother output that may oversmooth finer details. For noisier data, a lower polynomial value is preferable, and for cleaner data, a higher value helps preserve more details.
cutoff_low: (float, optional) Low-frequency cutoff for the Lanczos filter, designed for data with hourly resolution. Units: Time steps. Default: 168.
Recommendation: Set this to the equivalent of 7 days in time steps to filter out planetary wave influences on vorticity.
cutoff_high: (float, optional) High-frequency cutoff for the Lanczos filter, suitable for reducing high-frequency noise in hourly data. Units: Time steps. Default: 48.
Recommendation: Set this to the equivalent of 2 days in time steps to effectively filter out mesoscale influences on vorticity.
Regarding the spurious oscillations detector: This might be happening because the vorticity delta between the first time steps is much higher than for the rest of the series. This can trigger the detector, but in these cases, the warning can generally be ignored as it’s a conservative check.
In response to the issues with residual stages or gaps in the series, we’ve added warnings that detect and alert users if residual periods appear in the middle of a time series or if there are unclassified time steps (gaps). These warnings indicate potential data quality concerns and suggest adjusting pre-processing options, which may help resolve such issues. This update should aid in identifying and addressing unusual behaviors automatically when processing larger datasets without needing to review each individual time series manually.
Thanks for confirming that the hemisphere adjustment works well for wind data, and I appreciate the additional feedback on the SLP data. As you noted, input quality does indeed play a big role,
Regarding the code review, thank you for the kind words! I agree with your observation about testing; however, I find that focusing on testing the main functionality, especially for determine_periods, provides clearer debugging and maintenance benefits in this case (at least for me). I've also cleaned up the commented imports in determine_periods.py as suggested.
We’ve updated the documentation in the paper to reflect the new hemisphere
option and clarified its intended use cases, particularly for detecting cyclone phases in both vorticity and alternative series like SLP and wind speed.
Thank you once again for the detailed review—it’s made a big difference in refining the package.
I just caught this: There is a typo in the title of the documentation: It is written CyloPhaser with a missing "c".
Hi @stella-bourdin,
Thanks for catching that typo! I’ve fixed the "CycloPhaser" typo in the documentation.
Regarding the pre-processing recommendations, they’re actually located in the Usage Guide rather than the API doc. I wanted to keep the API documentation as straightforward as possible, while the Usage Guide offers a more thorough explanation of how to customize filtering.
Let me know if you have any more questions!
Apologies for my delay in getting started, I've just come off of some travel. Initial thoughts are below before I try to break the package using a variety of other data.
I wasn't able to install requirements.txt
using conda-forge, but I was able to install through pip
. Not worth changing, just wanted to note. Some of the libraries aren't available through conda-forge/conda/mamba.
The example works as described, and the test passes.
After the previous reviewer, the authors include a statement of need in the documentation. I think this largely meets the mark, but the homepage of the documentation is not as helpful from this end as I think it can be. I think a reference to this paper or the one in JOC would be useful here. It also needed to be clarified to me how or where one would get a vorticity timeseries, although this becomes clearer the further into the documentation that you go.
I think, ultimately, the best way to resolve my concern here would be to elaborate much more on the homepage about what this package provides an average user. Make it clear that this package is the first that allows translation from tracked ET cyclone vorticity (or potentially other) information to lifecycle stage.
I would have liked instructions on how to install the latest and/or in development package. This is an easy fix. Note also my comment about conda/mamba above. The scientific software world seems split between the two at the moment.
The only example uses pre-packaged data from the package itself. I think showing what this data looks like (or how to generate it by oneself) would be useful. The note on data frequency is useful, but not clear enough. I would have assumed, before reading into the documentation deeply, that if I provided a pd.Series object indexed by a DatetimeIndex, the package would automatically calculate the appropriate values, etc.
I think making it clear that one needs to have the vorticity already from a tracking package (either the one already provided by the author or from a different package) in the examples or on the homepage is a necessary change. That was not immediately clear to me.
I would have liked to have seen the other publicly accessible functions documented with docstrings.
I think the paper is in overall good shape.
Thank you very much for your comprehensive review and insights. We've made several updates based on your feedback:
We've expanded the installation instructions, adding detailed steps for setting up the latest package. The instructions now address setup in various environments, including options for those using pip
or a virtual conda
environment.
To address the need for clarity on the package's purpose and the source of input data, we have revised both the README and the documentation homepage. These updates emphasize that CycloPhaser does not generate vorticity series and outline how users can obtain these series, either through available tracking packages or from published databases. As it was no longer needed, we removed the Statement of Need
documentation section.
We've also added citations for both the paper currently under review and the published study in the JOC to provide users with contextual references.
To further clarify the input data requirements, we’ve added an example track file format in the documentation. This example demonstrates the structure expected for time, latitude, and longitude columns in the input data.
Regarding your suggestion on data frequency, you’re correct that the package automatically handles pd.Series
objects indexed by a DatetimeIndex
(as stated in the usage guide and API documentation). The x
argument is only required when using a list or array, as the DatetimeIndex
serves as the time reference for pd.Series
inputs.
Additional docstrings have been added for other accessible functions to improve usability and help developers integrate or modify parts of CycloPhaser more easily.
Thank you again!
Hi, @observingClouds, that is all good for me now! @daniloceano, thank you for your patience while I attempted to break your code and for you thorough and timely answers.
@stella-bourdin, @freemansw1 thank you very much for your very thorough and helpful review. @freemansw1 please let us know if there are still open discussion points from your point of you. If this is no longer the case please tick-off all items in your review checklist. Thank you!
Thank you very much @stella-bourdin !
Submitting author: !--author-handle-->@daniloceano<!--end-author-handle-- (Danilo Couto de Souza) Repository: https://github.com/daniloceano/CycloPhaser Branch with paper.md (empty if default branch): joss-submission Version: 1.8.1 Editor: !--editor-->@observingClouds<!--end-editor-- Reviewers: @freemansw1, @stella-bourdin Archive: Pending
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@freemansw1 & @stella-bourdin, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @observingClouds know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @freemansw1
📝 Checklist for @stella-bourdin