openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
712 stars 38 forks source link

[REVIEW]: PDR: The Planetary Data Reader #7256

Open editorialbot opened 6 days ago

editorialbot commented 6 days ago

Submitting author: !--author-handle-->@Sierra-MC<!--end-author-handle-- (Sierra V. Brown) Repository: https://github.com/MillionConcepts/pdr Branch with paper.md (empty if default branch): joss_paper Version: v1.1.2 Editor: !--editor-->@dfm<!--end-editor-- Reviewers: @AndrewAnnex, @gaelccc Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/097f846381b65e8d4b8be1d0aeb9c3ba"><img src="https://joss.theoj.org/papers/097f846381b65e8d4b8be1d0aeb9c3ba/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/097f846381b65e8d4b8be1d0aeb9c3ba/status.svg)](https://joss.theoj.org/papers/097f846381b65e8d4b8be1d0aeb9c3ba)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@AndrewAnnex & @gaelccc, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @gaelccc

📝 Checklist for @AndrewAnnex

editorialbot commented 6 days ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 6 days ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.17189/2ass-x557 is OK
- 10.17189/07q9-ph18 is OK
- 10.17189/f8xf-6a29 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: HOW WE’LL KNOW WE CAN READ ALL THE DATA IN THE PDS...
- No DOI given, and none found for title: pdr-tests: the testing suite/toolchain for ‘pdr‘
- No DOI given, and none found for title: Planetary Data System Standards Reference Version ...
- No DOI given, and none found for title: Viking Orbiter Imaging Bundle
- No DOI given, and none found for title: planetarypy
- No DOI given, and none found for title: Automated Spectral Image Processing Techniques in ...
- No DOI given, and none found for title: pilo: Planetary Input/Output
- No DOI given, and none found for title: PDS4 Tools
- No DOI given, and none found for title: VICAR: Video Image Communication and Retrieval
- No DOI given, and none found for title: The Astropy Project
- No DOI given, and none found for title: Rasterio

❌ MISSING DOIs

- None

❌ INVALID DOIs

- https://zenodo.org/records/12545688 is INVALID because of 'https://doi.org/' prefix
editorialbot commented 6 days ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.09 s (1083.0 files/s, 155790.2 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                           81           1361           2737           6793
Markdown                          9            390              0           2297
YAML                              8              6             11            133
TeX                               1             14              0            121
Jupyter Notebook                  1              0            604             59
Bourne Again Shell                1              1              0              2
--------------------------------------------------------------------------------
SUM:                            101           1772           3352           9405
--------------------------------------------------------------------------------

Commit count by author:

   258  michael
   216  Sierra-MC
   166  mstclair@millionconcepts.com
   129  Sierra Brown
   126  curtiss9
    80  m-stclair
    48  cmillion
    32  Sierra V. Kaufman
    19  Sierra V. Brown
    11  Zack Weinberg
    10  Chase Million
     5  M. St. Clair
     1  Michael Aye
editorialbot commented 6 days ago

Paper file info:

📄 Wordcount for paper.md is 905

✅ The paper includes a Statement of need section

editorialbot commented 6 days ago

License info:

🟡 License found: Other (Check here for OSI approval)

editorialbot commented 6 days ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

dfm commented 6 days ago

@AndrewAnnex, @gaelccc — This is the review thread for the paper. All of our correspondence will happen here from now on. Thanks again for agreeing to participate!

👉 Please read the "Reviewer instructions & questions" in the first comment above, and generate your checklists by commenting @editorialbot generate my checklist on this issue ASAP. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#7256 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please try to make a start ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule. Please get your review started as soon as possible!

gaelccc commented 4 days ago

Review checklist for @gaelccc

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

gaelccc commented 4 days ago

@editorialbot generate pdf

editorialbot commented 4 days ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

gaelccc commented 4 days ago

A few comments/suggestions while I go through the main example: 1) To run the example notebook requests is required. Very easy to install and not a big deal. However given the statement in the manuscript:

pdr‘s interface is designed to be maximally accessible to the introductory Python user

I would consider adding requests to the list of dependencies.

2) For spatial data types (images, raster fields) since the spatial extent/axes is specified in the file labels, it would be a nice addition to be able to dump also the axes, or alternatively have a .to_xarray() method which dumps an xarray.Dataset with the correct spatial axes. An example of this would be a geoid error map (https://pds-geosciences.wustl.edu/mgn/mgn-v-rss-5-gravity-l2-v1/mg_5201/gravity/). What I envisage is: besides dumping the array, this addition would allow the user to dump an xarray with fields [geoid_error, longitude, latitude].

3) I tried another format mentioned in the example, but officially supported (as per the documentation). Notably bsp files from the LRO repository. In this case the method .read() works, but doesn't seem to produce useful output. All I get is the label of the file and a warning stating that the bsp pointer is still not fully implemented.

4) The output I get from reading a wea file (e.g., the ones here) is wrong. Somehow decimal digits are not assigned to the integer part, but rather to the next columns. E.g., the number 0958.9 is read as 958 and 9 is assigned to the next column. The following column will read 9 N where N is the number supposed to be in that column (e.g., if N=23.4 then 9 N= 9 23). The output type, indeed is str.

AndrewAnnex commented 4 days ago

Review checklist for @AndrewAnnex

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

AndrewAnnex commented 4 days ago

some initial comments on the main paper:

lines 44-50: This paragraph mentions two performance claims, and the review guidelines states that performance claims need to be confirmed as per criteria 3 in functionality. Suggesting that a simple benchmark be performed to support.

line 45: pdr as the first word in the sentence isn't capitalized, now I know the project name is lower case but maybe a filler word could be added to make it clearer where that sentence starts.

line 54: same as above line 55: same as above (I stopped here as I noticed some instances at earlier lines)

line 61-64: VICAR and ISIS aren't python packages so not sure if it is really relevant to the subject of Python libraries for accessing PDS data. They certainly enable access to PDS data (to varying degrees) but they fundamentally have different goals/aims. Comparisons to pds4_tools, astropy, etc are more aligned with this. Suggesting either not mentioning these other projects or qualifying them by stating they don't attempt to address the issue of access of PDS data with Python explicitly.

line 80: Typo in capitalization of title

Sierra-MC commented 3 days ago

A few comments/suggestions while I go through the main example:

  1. To run the example notebook requests is required. Very easy to install and not a big deal. However given the statement in the manuscript:

pdr‘s interface is designed to be maximally accessible to the introductory Python user

I would consider adding requests to the list of dependencies.

@gaelccc

We try not to add unnecessary dependencies so we'd prefer to not do that. requests used to back a function we had in pdr but was removed as a dependency when we cut those features. I was confused how the binder notebook was running based on your callout that we don't have it as a dependency and did some digging. Apparently requests is a dependency of binderhub, so when we removed the dependency we didn't notice an error. We're in the process of considering a move to colab and will be updating the way we run the environments then, in the meantime to fulfill this, I've added a comment in the notebook that explains you'll need to install requests. While we do design pdr to be accessible to introductory Python user, I think this is a reasonable compromise to prevent adding additional dependencies we don't need in the core code.

Sierra-MC commented 3 days ago

@AndrewAnnex

some initial comments on the main paper:

lines 44-50: This paragraph mentions two performance claims, and the review guidelines states that performance claims need to be confirmed as per criteria 3 in functionality. Suggesting that a simple benchmark be performed to support.

line 45: pdr as the first word in the sentence isn't capitalized, now I know the project name is lower case but maybe a filler word could be added to make it clearer where that sentence starts.

line 54: same as above line 55: same as above (I stopped here as I noticed some instances at earlier lines)

line 61-64: VICAR and ISIS aren't python packages so not sure if it is really relevant to the subject of Python libraries for accessing PDS data. They certainly enable access to PDS data (to varying degrees) but they fundamentally have different goals/aims. Comparisons to pds4_tools, astropy, etc are more aligned with this. Suggesting either not mentioning these other projects or qualifying them by stating they don't attempt to address the issue of access of PDS data with Python explicitly.

line 80: Typo in capitalization of title

How should we include the benchmarks? We have some that we've run that would be appropriate to support those claims, but I'm just unsure what's the usual format/method that they're shown here. Is that something that just is presented during this review, that is included as a reference that we put on zenodo, that is stated in the text, or perhaps some other method? @dfm do you have clarification on this?

For starting sentences with proper lowercase it appears to be based on style guides. It's something Million Concepts does quite often, letting the period mark where a new sentence starts. If JOSS has a particular style-guide convention banning this I'll reword, but if not, I'd prefer to leave it as is as to not introduce awkwardly worded sentences or disrupt the flow solely for capitalization purposes. @dfm I don't see a style guide on the website, does JOSS have one?

The VICAR/ISIS mention (and reference) has been removed.

The typo in title capitalization has been fixed. Good catch.

m-stclair commented 3 days ago

@gaelccc The type of xarray interface you describe is a good suggestion, but it is outside the scope of pdr for several reasons.

On ARRAY objects and tables:

The spatial metadata of the geoid products you link is very clear on manual inspection, and there are a handful of other products in the PDS with ARRAY objects that follow that specific format (Magellan gravity products and Clementine gravity / topography products). However, Most PDS3 ARRAY objects are not used to describe nice 2D tables, or in general N-d arrays of the kind xarray is intended to deal with. They are used to describe repeated 'structs' of various kinds, some but not all of which 'nest' additional arrays, which are sometimes spatial and sometimes not, and their units and intended interpretive layout are often described only in comments, documentation, or adjacent struct fields that have entirely freeform naming conventions. Similarly, many TABLE and SPREADSHEET objects contain one or more sections that are intended to be interpreted as arrays (which may have multiple columns per array element that are intended to be interpreted as, e.g., x/y/z axes), sometimes supplemented by one or more sections that give units for those arrays, generally alongside columns that give metadata for those arrays. Column names are again entirely freeform, so the relationship between, say, a column that contains reflectance measured at a particular spectrometer bin and a column that contains band center for that bin cannot be straightforwardly determined from automated inspection of the label. Determining in general when these nested objects contain 'axes', what those axes are, what units may or may not be associated with them, and how to break them out and present them to the user would require the application of complex inference heuristics. This is an extremely interesting-sounding project, but it is outside the scope of pdr, and might well be as large a work effort as the entirescope of pdr.

On rasters:

Some rasters have explicit per-element spatial (or spectral, photometric, etc.) coordinates given in 'backplanes' of some type. This is directly analogous to the ARRAY/TABLE situation above: the names and locations of these backplanes (often in separate files) are entirely freeform, so automatically determining when 'backplanes' describe multiple physical axes of an array would require application of the same sort of complex inference rules.

Some rasters instead have an associated map projection specification. Some PDS3 map-projected products give one in the label, but others give it only in separate documents / products. Also, many that do contain some kind of projection description in the label do not actually fully specify it, or specify it in a nonstandard way. Many also have variances from the standard that are described only in prose in documentation or catalog files. A few give it only as WCS objects in an associated FITS HDU. This is why even specialty GIS codes like the GDAL PDS3 driver cannot parse a correct projection from many PDS3 products without additional information.

More broadly, computing map projections falls into a category of domain-level processing that is outside the scope of pdr. Other major examples include debayering / demosaicking and the application of weights or other calibration factors. The purpose of pdr is to reliably put data in a form that can be processed at domain level.

AndrewAnnex commented 3 days ago

@AndrewAnnex

some initial comments on the main paper: lines 44-50: This paragraph mentions two performance claims, and the review guidelines states that performance claims need to be confirmed as per criteria 3 in functionality. Suggesting that a simple benchmark be performed to support. line 45: pdr as the first word in the sentence isn't capitalized, now I know the project name is lower case but maybe a filler word could be added to make it clearer where that sentence starts. line 54: same as above line 55: same as above (I stopped here as I noticed some instances at earlier lines) line 61-64: VICAR and ISIS aren't python packages so not sure if it is really relevant to the subject of Python libraries for accessing PDS data. They certainly enable access to PDS data (to varying degrees) but they fundamentally have different goals/aims. Comparisons to pds4_tools, astropy, etc are more aligned with this. Suggesting either not mentioning these other projects or qualifying them by stating they don't attempt to address the issue of access of PDS data with Python explicitly. line 80: Typo in capitalization of title

How should we include the benchmarks? We have some that we've run that would be appropriate to support those claims, but I'm just unsure what's the usual format/method that they're shown here. Is that something that just is presented during this review, that is included as a reference that we put on zenodo, that is stated in the text, or perhaps some other method? @dfm do you have clarification on this?

For starting sentences with proper lowercase it appears to be based on style guides. It's something Million Concepts does quite often, letting the period mark where a new sentence starts. If JOSS has a particular style-guide convention banning this I'll reword, but if not, I'd prefer to leave it as is as to not introduce awkwardly worded sentences or disrupt the flow solely for capitalization purposes. @dfm I don't see a style guide on the website, does JOSS have one?

The VICAR/ISIS mention (and reference) has been removed.

The typo in title capitalization has been fixed. Good catch.

I don't think there is an accepted standard for the benchmarks you need to worry about, as a suggestion something like the time required to parse a single label for example (with the units) would be sufficient.

As I read it, the reviewer's checklist says the justification should be made within the manuscript text and it should referencing specific performance metrics. Its is up to authors to devise the text and the metrics to present to justify performance claims (and the reviewer to judge if it's sufficient). Otherwise removing mentions of performance/speed solves the problem perhaps in a simpler manner.

gaelccc commented 3 days ago

@m-stclair Thank you for the thorough explanation. I understand now the complexity and agree that the correct interpretation of domain/axes can be left to the data user.

@Sierra-MC I understand the reason for minimizing dependencies. I find the solution of adding a comment in the notebook sufficient and appropriate.

Sierra-MC commented 3 days ago

@AndrewAnnex

some initial comments on the main paper: lines 44-50: This paragraph mentions two performance claims, and the review guidelines states that performance claims need to be confirmed as per criteria 3 in functionality. Suggesting that a simple benchmark be performed to support. line 45: pdr as the first word in the sentence isn't capitalized, now I know the project name is lower case but maybe a filler word could be added to make it clearer where that sentence starts. line 54: same as above line 55: same as above (I stopped here as I noticed some instances at earlier lines) line 61-64: VICAR and ISIS aren't python packages so not sure if it is really relevant to the subject of Python libraries for accessing PDS data. They certainly enable access to PDS data (to varying degrees) but they fundamentally have different goals/aims. Comparisons to pds4_tools, astropy, etc are more aligned with this. Suggesting either not mentioning these other projects or qualifying them by stating they don't attempt to address the issue of access of PDS data with Python explicitly. line 80: Typo in capitalization of title

How should we include the benchmarks? We have some that we've run that would be appropriate to support those claims, but I'm just unsure what's the usual format/method that they're shown here. Is that something that just is presented during this review, that is included as a reference that we put on zenodo, that is stated in the text, or perhaps some other method? @dfm do you have clarification on this? For starting sentences with proper lowercase it appears to be based on style guides. It's something Million Concepts does quite often, letting the period mark where a new sentence starts. If JOSS has a particular style-guide convention banning this I'll reword, but if not, I'd prefer to leave it as is as to not introduce awkwardly worded sentences or disrupt the flow solely for capitalization purposes. @dfm I don't see a style guide on the website, does JOSS have one? The VICAR/ISIS mention (and reference) has been removed. The typo in title capitalization has been fixed. Good catch.

I don't think there is an accepted standard for the benchmarks you need to worry about, as a suggestion something like the time required to parse a single label for example (with the units) would be sufficient.

As I read it, the reviewer's checklist says the justification should be made within the manuscript text and it should referencing specific performance metrics. Its is up to authors to devise the text and the metrics to present to justify performance claims (and the reviewer to judge if it's sufficient). Otherwise removing mentions of performance/speed solves the problem perhaps in a simpler manner.

Ah okay thank you for the clarification. I'll run some benchmarks and include the results in the text!

Sierra-MC commented 1 day ago

Text has been added to support the performance claims in the paper in this commit.

@AndrewAnnex

AndrewAnnex commented 1 day ago

@editorialbot generate pdf

editorialbot commented 1 day ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left: