editorialbot commented 2 years ago

Submitting author: !--author-handle-->@mlampros@hugoledoux<!--end-editor-- Reviewers: @evetion, @Jean-Romain Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/a01026676e546c6544423969337a5207"><img src="https://joss.theoj.org/papers/a01026676e546c6544423969337a5207/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/a01026676e546c6544423969337a5207/status.svg)](https://joss.theoj.org/papers/a01026676e546c6544423969337a5207)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@evetion & @Jean-Romain, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @hugoledoux know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @Jean-Romain

📝 Checklist for @evetion

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 2 years ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.04 s (897.3 files/s, 189398.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               15            407           1295           1331
Markdown                         6            354              0            995
Rmd                              4            756           1006            778
XML                              1              0              2            441
YAML                             9             55             68            259
TeX                              1              3              0             31
Dockerfile                       1              7              6             16
-------------------------------------------------------------------------------
SUM:                            37           1582           2377           3851
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 2 years ago

Wordcount for paper.md is 1312

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jean-Romain commented 2 years ago

Review checklist for @Jean-Romain

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/mlampros/IceSat2R?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@mlampros) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[ ] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[ ] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

evetion commented 2 years ago

Review checklist for @evetion

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/mlampros/IceSat2R?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@mlampros) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Jean-Romain commented 2 years ago

In the first part my review I'm following the authors/reviewers guidelines by citing the guidelines and providing my associated comments to discuss if the paper and software fit the guidelines. In the second part my review analyses the code and the features of the package more in depth.

JOSS guidelines

JOSS publishes articles about research software. This definition includes software that: solves complex modeling problems in a scientific context (physics, mathematics, biology, medicine, social science, neuroscience, engineering); supports the functioning of research instruments or the execution of research experiments; extracts knowledge from large data sets; offers a mathematical library, or similar. While useful for many areas of research, pre-trained machine learning models and notebooks are not in-scope for JOSS.

The IceSat2R package is a package that essentially builds string queries to download data from a remote server using the public API provided by the service. In this sense it does not solve complex problem, it does not extract knowledge from large data sets, it does not offer a mathematical library. While useful, it does not meet the criterion mentioned above.

As a rule of thumb, JOSS’ minimum allowable contribution should represent not less than three months of work for an individual. [...]

The package essentially build strings to send queries and download data from a remote server. The core of the IceSat2R is made of approximately 1000 lines of R code with a large portion of the code doing nothing but handling user's input errors (which is normal and expected in R code) and the programming style significantly increases the number of lines. Even considering the time spent at writing the documentation this definitively cannot be considered as 3 months of work for a single person.

Some factors that may be considered by editors and reviewers when judging effort include:

Age of software (is this a well-established software project) / length of commit history.

Number of commits.

Number of authors.

Total lines of code (LOC). Submissions under 1000 LOC will usually be flagged, those under 300 LOC will be desk rejected.

Whether the software has already been cited in academic papers.

Whether the software is sufficiently useful that it is likely to be cited by your peer group.

The software is very young (first release 2 months ago) and definitively not well-established. As mentioned above the number of line of code is approximately 1000. I don't know if the software as already been cited but considering it is two months old I think it is temporarily impossible that IceSat2R has citations. Considering I'm not working with ICESat data I can't tell if the software is useful but in my own daily work I regularly appreciate that people spent time at proving easy to use tools to get access to data. The question is to know if the software is easy and this will be discussed later.

In addition, JOSS requires that software should be feature-complete (i.e., no half-baked solutions), packaged appropriately according to common community standards for the programming language being used (e.g., Python, R), and designed for maintainable extension (not one-off modifications of existing tools). “Minor utility” packages, including “thin” API clients, and single-function packages are not acceptable.

I'd say that the software is not feature complete and does not meet R standards and I will explain why later when analyzing the code and the features. I think this package is a thin API client. This is not bad in the absolute but a little light for a publication.

The paper should be between 250-1000 words. Authors submitting papers significantly longer than 1000 words may be asked to reduce the length of their paper.

I wanted to comment that the paper was particularly short and empty then I read this guideline. Excluding source code the paper contains approximately 50 lines of text which means something like 700 words. It fits with the guideline but I think that all the bullets mentioned in the guidelines and not fulfilled

[x] A list of the authors of the software and their affiliations, using the correct format (see the example below).
[x] A summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience.
[ ] A Statement of Need section that clearly illustrates the research purpose of the software.
[ ] A list of key references, including to other software addressing related needs. Note that the references should include full names of venues, e.g., journals and conferences, not abbreviations only understood in the context of a specific discipline.
[ ] Mention (if applicable) a representative set of past or ongoing research projects using the software and recent scholarly publications enabled by it.
[x] Acknowledgement of any financial support.

Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

No. For example the man page of available_RGTs() only says Reference Ground Tracks (RGTs). That's it! This does not help me to understand what the function does. The argument technical_specs_url is virtually not documented because the only way to understand it is to already know what it does. The man page of time_specific_orbits() only says Time Specific Orbits. The man page of getTrack() only says Get the ICESAT-2 Tracks. I don't know what to do with that. This is far to be enough to reach R standards. One of the strength of R is the documentation. I consider the package not documented.

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

No. The CRAN checks all documentation examples. In IceSat2R all the examples are wrapped into \dontrun (or worse if (FALSE) {} which means that the CRAN cannot test a single example of the documentation. The unit tests are all designed to only test error handling, not functionalities. In addition the vignettes that are expected to be generated and tested by the CRAN are actually hard coded and not a single line of code is actually ran. So, in practice, for what I've seen not a single line of code coding for real features is actually tested. And the software does not work as we will see later.

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

I won't put any judgement on English, it is not my tongue but the R code content must be revised. See my comments later.

Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

No. The paper contains 3 references including a reference to the package itself (so 2 references). I'm not in ICESat stuff but this definitively looks light. There is no other API in other languages/software for ICEsat data?

In depth review of the package

First of all I did not run the code shown in the paper because it is written in such a way that it is not copy/pastable. I can comment on it anyway. I think the code is unnecessarily complex. According to what I understood I think it should not be more than 10 or 20 lines. Why it is so long and complex can be explained by the fact the package is not feature-complete and does not meet R community standards. It is going to be hard to explain without being able to copy paste code chunks and without being able to refer to a specific line (code lines are not numbered) but:

The minimal reproducible examples (MRE) provided in the paper requires to loads 12 third party packages. For such MRE I'm not expecting to need more that 1 to 3 packages i.e IceSat2r of course, sf of course and that's it.
rds data should belong in data/ and be loaded with data(goms_himal).
Why so much sf data pre-processing? I'm expecting IceSat2r to accept sf objects without the need to make tedious extraction of wkt string in loop before to call IceSat2r functions. This point really goes into community standards. The strength of R is that everybody can use extremely easily very complex and advanced stuff. Here author put an extra layer of complexity that can easily be handled internally in the IceSat2r functions.
Why are there so much for loop? First, all for-loops should be replaced by apply family functions. But yet it is unnecessarily complex. The IceSat2r functions should be natively vectorized and users should never need to loop themselves.
Why is it so complex to call a function with a bounding box as input? We need to call 4 times as.numeric(bbox["side"]). IceSat2r functions should accept bbox objects from sf natively and optionally (but recommended) terra and raster extent objects.

Overall the code presented in the article is tedious to read and understand but could be very simple and smooth by vectorizing the functions and by doing all the pre-processing job internally. In R we are expecting all of this to be done by the package not by the user. This is why I said earlier that the package is not feature-complete.

Then I tested vignette 1 but the code bugged after line 9. I was not able to go further

avail_cycles = available_RGTs(only_cycle_names = TRUE)
#> Error in seq.int(0, to0 - from, by) : 'to' must be a finite integer

Then I tested vignette 2. Same error. As a conclusion I was not able to test the package at all. I see in the README there is a shiny app but can't find easily how to open it.

Conclusions

Document the package extensively.
Vectorize the functions internally
Accept high level objects as input (sf::data.frame,sf::bbox, raster::Extent, terra::ext)
Add comprehensive unit tests includind feature-based tests. I understand that the package downloads data and if the server is down the package tests may fail on CRAN. This is not a fatality and there are methods to cope with that properly.
Simplify your tutorial code (this should be straightforward after dealing with points 2 and 3)
Debug your code. I was not able to run a single example using CRAN version. If point 4 was respected this would have never happened.

mlampros commented 2 years ago

Dear @Jean-Romain thank you for reviewing the code of the IceSat2R package and for your time spent to go through the code. I'd like to ask if I'm allowed to comment on your review?

Jean-Romain commented 2 years ago

I'm ok with that and for what I've understood of the review policy I think it is encouraged

hugoledoux commented 2 years ago

It is encouraged indeed.

Thanks @Jean-Romain for this quick (a record I think!) and thorough review.

mlampros commented 2 years ago

The IceSat2R package is a package that essentially builds string queries to download data from a remote server using the public API provided by the service. In this sense it does not solve complex problem, it does not extract knowledge from large data sets, it does not offer a mathematical library. While useful, it does not meet the criterion mentioned above.

solves complex modeling problems in a scientific context (physics, mathematics, biology, medicine, social science, neuroscience, engineering);

That's true the current state of the IceSat2R package does not solve complex modelling problems

offers a mathematical library

The IceSat2R package is not a mathematical library

supports the functioning of research instruments or the execution of research experiments;

Let's see from the perspective of an R programming user, who might be a potential scientist and involved in remote sensing but does not have the time and knowledge to access within a time interval (days or week) IceSat-2 data from the OpenAltimetry API . How can the IceSat2R package help this person perform potential research experiments?

there are three vignettes (one, two, three) explaining the functionality in detail and moreover how an R user can combine IceSat-2 data with other external data sources (such as elevation data). Additionally, the third vignette shows how the user can decrease the computation time by using virtual files rather than downloading all Reference Ground Tracks.
each (exported) function of the R package includes an example section and documentation (even functions that could be used as internal, such as "available_RGTs()")
besides the interface to the OpenAltimetry API there is a connection to the NSIDC services to download the latest Reference Ground Tracks (and cycles) that are required to verify and validate the output tracks of the OpenAltimetry API (these are updated regularly using a Github Action). Figure 2 of my submitted paper shows how the OpenAltimetry and NSIDC data can be used for this purpose.

extracts knowledge from large data sets

The official web page mentions "OpenAltimetry is a cyberinfrastructure platform for discovery, access, and visualization of data from NASA’s ICESat and ICESat-2 missions" The R user can take advantage of the OpenAltimetry API from within R to download IceSat-2 data in a manageable way. It's not feasible to download vast amounts of data without using queries and this is the purpose of the API's. Once the user has access to data of an Area of Interest, he/she can use the IceSat2R package to combine IceSat-2 data with other external data sources and that way to extract valuable information (I explained it using only one external data source in my second vignette)

mlampros commented 2 years ago

Even considering the time spent at writing the documentation this definitively cannot be considered as 3 months of work for a single person.

I've started working on the IceSat2R package in middle of January 2022 (after signing the contract with the R consortium) and I finished it (including uploading the code and creating the various Github Actions) in the first weeks of March 2022. I have to mention, though, that I was interested in IceSat-2 data before starting writing the code and towards the end of 2021 I spent 2 to 3 weeks to read papers, articles, tutorials and existing code.

mlampros commented 2 years ago

No. For example the man page of available_RGTs() only says Reference Ground Tracks (RGTs). That's it! This does not help me to understand what the function does. The argument technical_specs_url is virtually not documented because the only way to understand it is to already know what it does. The man page of time_specific_orbits() only says Time Specific Orbits. The man page of getTrack() only says Get the ICESAT-2 Tracks. I don't know what to do with that. This is far to be enough to reach R standards. One of the strength of R is the documentation. I consider the package not documented.

The functions,

can be considered as secondary functions (internal) that were exported in case that R users would like to take advantage. I can further improve the documentation.

For instance in the time_specific_orbits() documention I mention currently in the details section: "These files contain the reference ground track time and locations for specific date ranges. Updated KML files have been posted to the 'tech-specs' website (see the 'references' section for more details) containing individual files for each RGT with a date and time stamp posted every 420 kilometers along-track (roughly 1 minute of flight time in between each point). The first RGT is 234; this is where the time series begins. The date of each RGT is in the file name, so the user can easily ascertain where and when ICESat-2 will be on a particular day."

mlampros commented 2 years ago

The CRAN checks all documentation examples. In IceSat2R all the examples are wrapped into \dontrun (or worse if (FALSE) {} which means that the CRAN cannot test a single example of the documentation. The unit tests are all designed to only test error handling, not functionalities

The reason that I didn't include all tests (but only tests related to error handling) was because the majority of the functions make calls to the OpenAltimetry API. I've already submitted many R packages to CRAN and based on my experience this might cause errors not only during the first submission of an R package but also during regular checks of CRAN

mlampros commented 2 years ago

In addition the vignettes that are expected to be generated and tested by the CRAN are actually hard coded and not a single line of code is actually ran. So, in practice, for what I've seen not a single line of code coding for real features is actually tested.

This has an explanation too. There is always a trade off when submitting a package to CRAN. Initially I intended to submit the vignettes in .html format where the code that didn't make any calls to the OpenAltimetry API would be auto-generated by CRAN (this allows a package maintainer to show potential visualizations in the true format - a leaflet map where an R user can zoom in and out) Then I realized that this approach would create a compressed .tar.gz file of approximately 13.5 MB which is way bigger than the allowed uploaded file size by CRAN (5 MB). Thus, I had to either exclude vignettes (from the 3 I would have to submit only 1). I decided to include only the necessary plots and visualizations in reduced size and to not auto generate the code (include as many details as possible about the functionality).

Currently I submit 2 versions of vignettes:

in .pdf format to CRAN where I use command system('R CMD build --compact-vignettes="gs+qpdf" IceSat2R') to reduce the .tar.gz file size
in .html format when I upload to Github (where visualizations can be displayed in the original true format)

And the current file size in in the limit of 5 MB

mlampros commented 2 years ago

And the software does not work as we will see later. avail_cycles = available_RGTs(only_cycle_names = TRUE)

> Error in seq.int(0, to0 - from, by) : 'to' must be a finite integer

I don't receive any errors when I run this line of code on my PC (Linux Mint), but I always encourage R users to open an issue to the Github repository (I mention that also in the blog post that I always upload for a new R package)

mlampros commented 2 years ago

Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

No. The paper contains 3 references including a reference to the package itself (so 2 references). I'm not in ICESat stuff but this definitively looks light. There is no other API in other languages/software for ICEsat data?

In my submitted paper I only used 3 sources (including the R package). The R package itself cites all authors for all IceSat-2 Products

mlampros commented 2 years ago

Overall the code presented in the article is tedious to read and understand but could be very simple and smooth by vectorizing the functions and by doing all the pre-processing job internally. In R we are expecting all of this to be done by the package not by the user. This is why I said earlier that the package is not feature-complete.

High level functions can be included. The main reason that the current code uses for-loops is for debugging purposes, especially when downloading multiple IceSat-2 products or for instance when comparing the tracks between the OpenAltimetry API and the NSIDC service specifications.

mlampros commented 2 years ago

I see in the README there is a shiny app but can't find easily how to open it.

The shiny application allows the user to create a global grid (or a grid based on an Area of Interest). The select_aoi_global_grid R6 class includes details in the example section how an IceSat2R user can take advantage of this shiny application to create either a 1- or 5-degree grid to perform the OpenAltimetry API queries (the OpenAltimetry API restricts the queries to a maximum of 1 or 5 degrees)

Jean-Romain commented 2 years ago

I don't know if I'm supposed to answer every single argument. I won't but I think I can clarify some points. Actually I know why you did what you did (CRAN policy, package size, check failures and so on) but I've been ask to say if yes/no the package fulfills a list of requirements. I looked into the packages and I answered factually if the answer is yes or no.

I've been asked if the software contains automatic test. The answer is no. You can argue why (and I already knew why actually), the fact is that there is no automatic testing. You can keep the vignette hard coded and keep the example wrapped in \dontrun (I do agree for the vignette I'm sometime doing the same for the same reason. I disagree with the documentation but I won't argue) but if you want me to answer yes to this question a significant part of the code must be tested either in the documentation or in unit tests (ultimately both). And as I said the fact it makes request to OpenAltimetry is not a problem. There are documented methods to cope with potential problem on CRAN. I even think there is a section about that in Writing R extensions.
I've been asked if the software is documented to a satisfactory level. I admit I did not read the entire documentation. I looked at the documentation of the few first functions encountered in your tutorial and I think it's fair. They were not documented. You can argue that I did not look to the correct functions because that ones are not important functions and so on. This is not receivable. I'm not supposed to know if the function is secondary or not. If the functions are really secondary functions this can be documented for example by explicitly mentioning that the functions are internal functions exported for debugging purpose. This is legit and informative. But if it is really what they are, don't use them at the very beginning of your very first tutorial. Don't use them at all in your tutorial actually. If the functions are supposed to be internal, use them internally. Or document them.
My argument about tedious examples and for loops is more opinion-based I admit. I've have been asked if the software meets R standards. In R we almost never write for loops. Internally sometime we need loop but it is relatively rare. But certainly not on the user side and certainly not in an beginner tutorial. Consequently I think it is fair to answer that R standards are not met. Your argument about debugging is not receivable. If the software is still in a debugging state or if the beginner tutorial aims for debugging it means either that your package is not ready for production or that the tutorial for beginners is not a tutorial for beginners but a unit test.
I opened an issue. I'll test the package more in depth when it will be fixed. Yet I persist to say that it would not have happened with unit tests ran on CRAN.

mlampros commented 2 years ago

High level functions can be included. The main reason that the current code uses for-loops is for debugging purposes, especially when downloading multiple IceSat-2 products or for instance when comparing the tracks between the OpenAltimetry API and the NSIDC service specifications.

Your argument about debugging is not receivable. If the software is still in a debugging state or if the beginner tutorial aims for debugging it means either that your package is not ready for production or that the tutorial for beginners is not a tutorial for beginners but a unit test.

I don't try to find excuses regarding the for-loops that I use. This is the way I write R code when I believe the code snippets that I use might give errors. The fact that I use for-loops in specific places in the vignettes and in the example sections is because I observed (in the past months) that in some cases, the calls to the OpenAltimetry API might not return the results instantly. This might had to do with the internet connection or with too many requests at the same time (I wasn't sure)

Jean-Romain commented 2 years ago

@hugoledoux we identified a small bug in https://github.com/mlampros/IceSat2R/issues/3 that entirely broke the package on my French language based system. Once the bug will be solved I'll conduct a review of the features. Probably not this week thought.

However it does not invalidate my previous comments.

And I was wrong unit tests on CRAN would probably not have detected that issue because I guess all CRAN machines are English based.

Edit: actually I can test the package right now by changing my system locale

mlampros commented 2 years ago

I updated the code of the IceSat2R package. The majority of the examples take more than 5 seconds (which is prohibitive for CRAN - I tested it once again) therefore I include the \dontrun. I included more tests (as I mention also in the opened issue), however it's not possible to run these tests on CRAN due to time constraints and additional due to potential failures. I use in the majority of the newly added tests the skip_on_cran() function. However, I updated the README.md file with a section (where I include instructions) on how a user can run locally all or specific files of the tests so that potential errors can be fixed.

mlampros commented 2 years ago

I've updated the code of the IceSat2R package based on the new issue 5 in the Github repository.

One of these changes was related to my submitted paper. It would be wise to re-generate the paper .pdf file

@editorialbot generate pdf

hugoledoux commented 2 years ago

Hello everyone, I am on holidays and now read all your exchanges. I appreciate the directness and thoroughness of @Jean-Romain , and I would agree with most of his comments.

The fact that the package has a limited scope has been discussed with the editors, and we thought that it would be useful to the scientific community so this shouldn't be an issue.

I am unfortunately not an R expert, but indeed unit tests should be present and documented. This is important (I see that some more has been done by @mlampros related to this actually).

Same for the docs, this is where JOSS often helps the authors to improve most: forcing them to have good docs, which takes time and effort.

I suggest we wait for the comments of @evetion and then we see where we're at.

evetion commented 2 years ago

I've reviewed the paper and documentation and ran some of the examples. I'm not a R user (installed it just for this), but am familiar with ICESat-2 and its data distribution. Disclosure: I'm also the author of a ICESat-2 data handling package, SpaceLiDAR.jl in Julia, interfacing with the NSIDC, not OpenAltimetry.

Overall, I recognize the need for packages like these, and the RGT download and handling is something I haven't seen automated before. I appreciate the effort put into the documentation, binder and Docker instructions. 👍🏻

Checklist

Here are the items that I couldn't check yet.

Functionality: Have the functional claims of the software been confirmed?

I ran into some bug (https://github.com/mlampros/IceSat2R/issues/9) on one of my first tries. I'll test some more after the weekend.

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

There's a statement of need in the paper, but it's missing from the documentation itself. You could just copy the relevant text.

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I can't find the state of the field in the paper. There are certainly other ICESat-2 related packages in other ecosystems?

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

The paper is to the point, but quite short and could use a bit of editing. For example, already the first sentence contains and as of October 2021, which doesn't seem to fit. The code section takes up a lot of space, much of it purely scripting around the package functions.

References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

The reference list is a bit light, in the sense that there's only 2 external citations (not counting the self-citation here). You cite the ICESat-2 ATL06 product, but could cite the ICESat-2 mission paper? Or maybe other packages, the website from which you download the RGTs?

Editing

I made several other reports based on issues found in the documentation. These are:

mlampros commented 2 years ago

In the latest commit I updated the code of the IceSat2R package to account for the checklist of @evetion,

I ran into some bug (https://github.com/mlampros/IceSat2R/issues/9) on one of my first tries. I'll test some more after the weekend.

I've added code to debug the potential bug

There's a statement of need in the paper, but it's missing from the documentation itself. You could just copy the relevant text.

I've added this missing part in the documentation of the IceSat2R package

I can't find the state of the field in the paper. There are certainly other ICESat-2 related packages in other ecosystems?

I've added the state of the field to the Joss paper

The paper is to the point, but quite short and could use a bit of editing. For example, already the first sentence contains and as of October 2021, which doesn't seem to fit. The code section takes up a lot of space, much of it purely scripting around the package functions.

I've updated the Joss paper to exclude irrelevant text and keep to the limit of a submission. I also reduced the code section, which is mainly used in the example use case

The reference list is a bit light, in the sense that there's only 2 external citations (not counting the self-citation here). You cite the ICESat-2 ATL06 product, but could cite the ICESat-2 mission paper? Or maybe other packages, the website from which you download the RGTs?

I've included further citations (state of field, ICESat-2 NSIDC Authors)

Regarding the remaining issues opened by @evetion,

I adjusted the name of ICESat-2 in the files that was feasible
I updated the vignette regarding the Copernicus DEM comparison. I applied a transformation using gdalwarp and I also used the GeographicLib to verify the output
I updated the README.md file to account for the issue of the docker image permissions

mlampros commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

evetion commented 2 years ago

Functionality: Have the functional claims of the software been confirmed?

I've updated my bug report, but the issue is blocking for most functionality as described in the vignettes.

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

There are no automated tests on Github. I think it would be good to run at least the vignettes there, so any errors are catched beforehand.

hugoledoux commented 2 years ago

@mlampros This bug needs to be fixed (or explained, does @Jean-Romain have it too?). Also, I support the suggest of @evetion to have the automated test on github-actions or others.

Jean-Romain commented 2 years ago

I do not have the issue reported by @evetion. Or I least I don't think I have it because it seems I went further than him. Yet I stopped the process before the end. The first few lines of the tutorials take me something like 30-45 minutes to run. It is not reasonable for a tutorial. I suspect the non fixed issue (https://github.com/mlampros/IceSat2R/issues/4) I reported to increase even more the computation time but I might be wrong.

More than one month later my review is the same on most points. Few unit tests were added but they are all skipped on CRAN which invalidate the usefulness of unit tests. I have never been able to go further than half of the first tutorial because it is extremely tedious, much of it purely scripting around the package functions as said by @evetion. This is because the package does not include any high level function. I opened and issue (https://github.com/mlampros/IceSat2R/issues/5) to make some suggestions/ideas on how to improve that but the latest version does not contains significant changes on this point. Nothing I spotted at least. Consequently I won't go through the tutorials because each requires 45 minutes of data downloading/preprocessing and then the tutorial are mainly lines and lines of scripting not directly related to features of the package.

The documentation however includes significant changes and all the functions I checked are now documented.

mlampros commented 2 years ago

first of all I'd like to thank everyone spent time to review my submitted paper to Joss and the code of my submitted R package. I also respect the opinion and review of @Jean-Romain and @evetion

Regarding the open issues in my Github repository:

The approach that I follow normally is to try to debug the code based on the user's input. This is exactly what I've done in case of issue 9. I'm waiting the response of @evetion so that I can make the adjustments to the code and fix the issue. In my opinion this issue is related to a single sublist that does not have the same format as the remaining ones and seems to be the same as issue 4 (I encountered this same issue during my configuration of the binder image)
The issue 7 was actually not a bug but an omission of my side, I'd like to thank @evetion for making me aware of the vertical difference between Copernicus DEM and the ICESat-2 data. I think I fixed this issue and I'm waiting the response of @evetion
I fixed issue 3 by modifying the code and I'm waiting the response of @Jean-Romain

I'd like to add my opinion regarding issue 5 which is also related to the previous comment of @Jean-Romain The IceSat2R package is a funded project and in my proposal I explained to the Consortium that I'll be in place to fix the issues and add functionality to the package upon request. Now, by adding functionality is meant to create new functions (could be high level functions as well) that will be of advantage to the R users of the package. The request of @Jean-Romain is to create a single function which will reduce many lines of code to a few ones. The functions that will be reduced to a single one are those that make a connection to the web,

downloading of RGT's from the NSIDC service
verifying the NSIDC RGT's with the OpenAltimetry API
downloading of ICESat-2 data from the OpenAltimetry API

In my opinion this is not a wise decision, because there are many things that can go wrong when connecting either to a web or to an API, and especially if there is not a single one but multiple connections (I encountered an issue in the past where my query to the OpenAltimetry API just froze, probably due to connection issues, without giving an indication - even by including verbosity). There are things I'm willing to do and things I'm hesitant to do and one of the things that I'm hesitant to do (regarding the IceSat2R package) is to add this high level function (for the reason that I explained). However, I'm open to pull requests from R users who believe that a similar function could be of advantage for their work or scientific study.

Jean-Romain commented 2 years ago

I do agree with @mlampros that, as the developer, you can do the way you want and if you prefer to make plenty of low level functions, it is your package after all. And I did not requested anything, I only suggested improvements to make your vignettes more like other R packages.

But as a reviewer I just don't have the time or the motivation to go through your vignettes again I'm sorry. Consequently I can't test the package functionalities as requested. We reported several bugs by testing only the very few first lines, I hope there are no other issues in non tested code. At this stage it is all in @hugoledoux hands. On my side I answered all the questions I could and I think I completed my task.

hugoledoux commented 2 years ago

@mlampros I have read all the reviews and discussions (and issues the reviewers opened).

Let me start by saying that JOSS has a formal peer-review process that is designed to improve the quality of the software submitted, and our reviewers become (knowledgeable) users, they are asked to consider the software both from the point-of-views of the code, and from the users.

I have to agree with @Jean-Romain: you should have high-level functions that perform key tasks in an easy way. The point that you were paid for developing those functions is not relevant here, JOSS is trying to improve your package so that it is user-friendly, that its docs is up to a certain standard, etc. By saying that "Potential R users of this R package are free to open a pull request in case such a high level function can be beneficial for their work or scientific study.", you omit that @Jean-Romain is one such user! Also, I do not understand your explanations about the high-level functions being problematic because one part of the process could go wrong, doesn't R have "try/catch" or exceptions handling? You could handle the potential errors and return an error code if something goes wrong during the process for instance.

I have also read the issues of @evetion and evidently your package wasn't tested on different OSes. Now we're in a weird situation where the reviewers are helping you to debug your code and they have to test and help you fix all the errors. This can be done to some extent for 1-2 issues, but it's not the idea of a review. I expect your code to be debugged and tested on different OSes, ideally with CI of some sort. This would allow to catch such errors.

I am especially worried about the comment that it takes 30-45min to run a tutorial, I think better should be done here. Isn't it possible to come up with a simple tutorial that uses a subset of a dataset? As long as the reviewers cannot run all the tests, your submission cannot be accepted by JOSS.

The good news is that JOSS is here to help you improve your submission, and we can wait for a while (we can put the review on pause). If you are willing to add: (1) higher levels that simply the life of users and are inline with what is considered normal by the R community; (2) unit tests that are debugged and tested by CI on different OSes; (3) better tutorials.

Let me know.

mlampros commented 2 years ago

@hugoledoux I would like to add the following which might not be explained in detail from my side in one of my previous comments or probably in the paper as well.

I am especially worried about the comment that it takes 30-45min to run a tutorial

The structure of the package and the way it works is the following (and also shown in the diagram of the Joss paper):
- the user selects an area of interest
- the RGT's have to be downloaded using the NSIDC service
- the RGT's of NSIDC have to be verified with the OpenAltimetry API
- the ICESat-2 data has to be downloaded based on the verified RGT's

Now, this process (even for small areas of interest) require time. It might take approximately 20 to 30 minutes depending on the user's internet connection and the number of queries to the web services that I mentioned.

you omit that @Jean-Romain is one such user!

I wouldn't consider @Jean-Romain a user of my R package, because potential users would have the time and patience to go through all examples and vignettes. I also would like to ask you @hugoledoux about the availability of the reviewers regarding time spent on reviewing a potential paper to Joss.

The point that you were paid for developing those functions is not relevant here

I submitted the paper here (Joss) mainly because it was one of my proposal points in my funded project. If the package is functional and my submission here gets accepted I have to report back to the ones I was funded

Also, I do not understand your explanations about the high-level functions being problematic because one part of the process could go wrong, doesn't R have "try/catch" or exceptions handling? You could handle the potential errors and return an error code if something goes wrong during the process for instance

The high level function won't change the execution time and won't make things easier for the users of the package (especially in case of an error). Having a high level function (consisting of 2 or 3 lines) does not mean that it will allow users to easily debug such errors. It will highly probable make the whole process more frustrating. This was also the reason that I'm hesitant to add such a function (especially when multiple calls are made to web services)

I have also read the issues of @evetion and evidently your package wasn't tested on different OSes. Now we're in a weird situation where the reviewers are helping you to debug your code and they have to test and help you fix all the errors. This can be done to some extent for 1-2 issues, but it's not the idea of a review. I expect your code to be debugged and tested on different OSes, ideally with CI of some sort. This would allow to catch such errors.

In my Github repository for the IceSat2R package I've setup Github actions which run for all 3 operating systems (Linux, Macintosh, Windows) to test the package functionality. @evetion helped me debug one issue (vertical difference between Copernicus DEM and ICESat-2 data) which was not related to the code of the package but with 1of the 3 tutorials. I normally add a single tutorial (vignette) to my R packages but this time I included 3 so that I can add more details. From all the other issues I would personally consider only 1 as an essential one (which appears to be the same for @evetion and @Jean-Romain and is issue 3 and issue 9. This same issue appeared during my binder configuration image and I'll have a second look today on how to add a test and catch the issue on the M1 Mac which @evetion mentions

Now we're in a weird situation where the reviewers are helping you to debug your code and they have to test and help you fix all the errors

I wasn't a reviewer before and I don't know if a reviewer should be also in place to spot potential bugs. However, based on my experience in the last years as a self-taught R programmer I stumbled upon many popular repositories (with many users, forks and stars) with hundreds or even thousands of issues. Definitely, this depends on the code base and number of active users.

The good news is that JOSS is here to help you improve your submission, and we can wait for a while (we can put the review on pause). If you are willing to add: (1) higher levels that simply the life of users and are inline with what is considered normal by the R community; (2) unit tests that are debugged and tested by CI on different OSes; (3) better tutorials.

Thank you but you don't have to put the review in pause. As I mentioned earlier, I already invested more time than I would consider for any of my other submitted R packages to create the Vignettes and I wouldn't modify anything else other than fixing potential bugs. Thus, from your mentioned (1) to (3) points, I will only be in place to accomplish your mentioned point (2). For high level functions, the R users of the package should submit a pull request.

hugoledoux commented 2 years ago

I am disappointed to read this @mlampros.

The 3 requirements I describe above are what the reviewers and myself think is necessary to be accepted by JOSS.

mlampros commented 2 years ago

@hugoledoux you shouldn't be disappointed. Probably it wasn't a good decision to submit an API package to JOSS and also as an R package developer and maintainer to make the false assumption that this API package tutorial will be accepted. As I said previously vignettes (or tutorials otherwise said) is not a requirement for an R package when submitting to CRAN. I just included these so that R users can have an idea of the various ways this API R package can be utilized. My responsibility as an R package developer and maintainer is to fix potential issues (in the code and existing tutorials) and to have a look to submitted pull requests. Feel free to close this review as from side I don't have to add something else. I'd like to thank again everyone spending time to review this API R package.

Jean-Romain commented 2 years ago

Well, all of this is not required for CRAN acceptation this is correct. On my side I do not actually see the missing features as a problem for publication. I only answered to questions asked by JOSS. JOSS has publication requirements and I'm not questioning these requirements. JOSS wants unit tests? My answer is that there is not really any true unit tests. I did not conclude for rejection myself actually.

However, while I do agree it should not be a reason for rejection we have additional related problem here.

There is no unit tests. I already made positive reviews in other journals for which this point was not an explicit requirement. But here we found 2 bugs preventing to use the package at the very beginning of the tutorials. No unit tests is always bad but when the packages works smoothly it is easier to relax some expectations. Especially CRAN run every example of the documentation so it already acts like some kind of unit tests. But in your package examples are wrapped in \dontrun. The documentation is not tested, no unit test and bugs that prevent to use the entire package. This is too much to argue that unit tests are not required.
The package is designed in such a way that it is very hard to use it without a tutorial that explicitly tells users what to do and in which order to use the functions. With high level functions we could guess what to do from the doc and from the function names. But with low level functions that require a lot of scripting around, how the users are expected to figure out what to do? So, in my opinion the problem is not the absence of vignettes strictly speaking, The problem is that the vignettes are very complex and I was not able to go to the end and there is no way, in my opinion, to use the package without these vignettes. Yet you can argue that the package is designed for ICESAT users who already know ICESAT API and maybe for those users your functions make more sense. After all, the R curl API assumes that you already know curl for example. On my side I'm not convinced by this argument, I think it goes against what we are all trying to tend toward in R, i.e. providing super easy way to use complex stuff. But the argument is maybe receivable by JOSS.

mlampros commented 2 years ago

Especially CRAN run every example of the documentation so it already acts like some kind of unit tests. But in your package examples are wrapped in \dontrun. The documentation is not tested, no unit test and bugs that prevent to use the entire package. This is too much to argue that unit tests are not required.

I include tests (either for CRAN or only for Continuous Integration) that are feasible regarding execution time (https://github.com/mlampros/IceSat2R/tree/master/tests/testthat). The vignettes can not be tested because they take too long. I don't know if it would make sense to have only a single small tutorial which would consist of a few lines and would be feasible to run on CRAN or can be tested in Continuous Integration (Github Actions)

The package is designed in such a way that it is very hard to use it without a tutorial that explicitly tells users what to do and in which order to use the functions

This is not correct. Please, have a look to the Example section of each function (https://mlampros.github.io/IceSat2R/reference/index.html) where I explain how the user should receive the results in detail (using all the functions required) In this API package I included 3 vignettes with different level of complexity and length where the users can follow line by line to receive the results.

I think it was not a good idea to submit a version of one of my Vignettes to JOSS for review and I now realize that the comment in my pre-submission inquiry (related to API's) was correct. I would prefer that the editor closes this review.

hugoledoux commented 2 years ago

I am aware that the package "works", but as pointed out by the reviewers our aim here is to make it "good".

Since the package is mostly meant as an interface to download *easily** ICESat-2 data, I believe that the functions and tutorials and examples should fulfil the three demands above.

I understand that you do not have time nor want to implement them @mlampros, so I will retract the paper.

Thanks @evetion and @Jean-Romain for your thorough reviews, testing, and issues. I personally learned something about R during the process!

hugoledoux commented 2 years ago

@editorialbot withdraw

editorialbot commented 2 years ago

I'm sorry @hugoledoux, I'm afraid I can't do that. That's something only eics are allowed to do.

hugoledoux commented 2 years ago

okay, then I will flag it for scope, and one EiC should withdraw the paper, for the reasons explained above.

hugoledoux commented 2 years ago

@editorialbot query scope

editorialbot commented 2 years ago

Submission flagged for editorial review.

arfon commented 2 years ago

I think it was not a good idea to submit a version of one of my Vignettes to JOSS for review and I now realize that the https://github.com/openjournals/joss/issues/1057#issuecomment-1081060582 in my pre-submission inquiry (related to API's) was correct. I would prefer that the editor closes this review.

Thanks for the productive conversation here folks. I'm happy to go ahead and withdraw the paper on behalf of the author.

arfon commented 2 years ago

@editorialbot withdraw

editorialbot commented 2 years ago

Paper withdrawn.

openjournals / joss-reviews

[REVIEW]: ICESat-2 Altimeter Data using R #4342

Status

Reviewer instructions & questions

Checklists

Review checklist for @Jean-Romain

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @evetion

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

JOSS guidelines

In depth review of the package

Conclusions

> Error in seq.int(0, to0 - from, by) : 'to' must be a finite integer

Checklist

Editing