ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

lightr #267

Closed Bisaloo closed 5 years ago

Bisaloo commented 5 years ago

Summary

lightr imports UV-VIS reflectance/transmission/absorbance proprietary file formats in R. It also allows the import of related metadata that are critical to ensure reproducibility but that are often discarded by other tools.

Package: lightr
Title: Read Spectrometric Data in R
Version: 0.0.0.9000
Authors@R: c(
    person("Hugo", "Gruson", role = c("cre", "aut"),
    email = REDACTED,
    comment = c(ORCID = "0000-0002-4094-1476")),
    person("Rafael", "Maia", role = "aut",
    email = "REDACTED",
    comment = c(ORCID = "0000-0002-7563-9795")),
    person("Thomas", "White", role = "aut",
    email = "REDACTED",
    comment = c(ORCID = "0000-0001-8493-9450"))
    )
Description: Parse various UV-VIS reflectance/transmittance/absorbance spectra
    file formats to extract spectral data and metadata.
Imports:
    pbmcapply,
    xml2
Suggests:
    covr,
    knitr,
    rmarkdown,
    spelling,
    testthat
URL: https://bisaloo.github.io/lightr, https://github.com/Bisaloo/lightr
BugReports: https://github.com/Bisaloo/lightr/issues
License: GPL (>=2)
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
Roxygen: list(markdown = TRUE)
Language: en-GB
VignetteBuilder: knitr

https://github.com/Bisaloo/lightr

People working with UV-VIS reflectance/transmittance/absorbance spectra, colour science (#colsci). People developing package to analyse spectral data with vision models (lightr has few dependencies). Even non-R users who do not own the proprietary software to convert the proprietary formats to csv files.

There is partial overlap with some other packages, as described in the README but to my knowledge, none of them have the same aims as lightr.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

Some function names do not meet rOpenSci criteria (e.g. getspec() and getmetadata()) but this is kept for backwards compatibility with the package pavo, from which lightr originated. Synonyms have been added (get_spec() and get_metadata()) but it is at the moment unlikely that the old names will be deprecated.


As mentioned in the template, some pieces still need polishing, mainly the vignettes but before going for one last push, I'd like to know if you would be interested.

sckott commented 5 years ago

thanks for your pre-submission inquiry @Bisaloo We are discussing now and will get back to you soon

karthik commented 5 years ago

👋 @Bisaloo. I will be your editor for this submission. Stay tuned for next steps.

karthik commented 5 years ago

Editor checks:


Editor comments

Package looks to be in excellent shape and your test coverage is fantastic! Goodpractice had very little to say and the only notable suggestion is to drop spelling from your suggests. I'm looking for appropriate reviewers now but if you have any suggestions for independent reviewers, please let me know. 🙏

  ✖ fix this R CMD check ERROR: Package suggested but not
available: ‘spelling’ The suggested packages are required for a complete check. Checking can be attempted without them by setting the environment variable _R_CHECK_FORCE_SUGGESTS_ to a false value. See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’ manual.

Reviewers: TBD Due date: TBD

Bisaloo commented 5 years ago

Thank you for your comment!

I don't really understand why spelling is problematic. It's used in many packages on CRAN without any issue (https://cran.r-project.org/package=spelling). Could you expand a bit more why I should remove it please?

Also, is it okay if I keep working on master until reviewers are found?

Bisaloo commented 5 years ago

@karthik, are you having trouble finding reviewers?

If so, I don't think it's necessary to find someone with a background in colour science or spectrometry. Anyone with experience with unusual file/data formats would probably be qualified. Some basic regex skills and knowledge of binary data (for parse_trm()) are a plus but not required IMO.

karthik commented 5 years ago

@Bisaloo I'm still having trouble recruiting appropriate reviewers (many declines). Do you have any independent suggestions that I could approach?

Bisaloo commented 5 years ago

@karthik, thanks for the update :+1: .

Here are some people with a colour science background that code:

Beyond that, as I said, I don't think it is required to have a background in colour science to review lightr. It is not (as opposed to colourvision or pavo) a package to perform colour vision models. It focuses on data import and manipulation so I'm confident anyone with a solid general knowledge of R could provide a good feedback.

Bisaloo commented 5 years ago

Here is the profile for a potential reviewer IMO. Maybe you will know someone who fits this description:

Someone who is familiar with file manipulation (list.files(), path(), tempdir(), file_ext()) and import (unzip(), file(), readLines()) in R, including binary files (readBin(), intToUtf8()) and XML files (via the xml2 package). Some metadata is imported through string manipulation (grep(), gsub(), strsplit()) as well but the regex patterns generally remain simple.

I think it helps to think about this package as similar to any other file import package (such as rio). The data type is just accessory here.

I'll try to add a vignette to describe the structure / internal organisation of the package at some point, which should greatly help for the review.

karthik commented 5 years ago

Thanks you @Bisaloo I have reached out to more folks and will update this thread very soon. Apologies again for this unusual delay.

karthik commented 5 years ago

After numerous delays (thanks for the patience), assigning reviewer 1.

@kjuraic Deadline: March 5, 2019

@kjuraic Please email or reach out if you have any trouble during the review process. 🙏

Bisaloo commented 5 years ago

@karthik, any news? Is there anything I can do on my end to help?

karthik commented 5 years ago

@kjuraic 👋Can you please update the thread with your review. Feel free to reach out to me if you've run into any issues. 🙏

karthik commented 5 years ago

@Bisaloo Thanks for checking in. Let me get an update from Krunoslav

karthik commented 5 years ago

Apologies for the unusual delays @Bisaloo. I am reassigning reviewers now.

karthik commented 5 years ago

Hello @Bisaloo I am so sorry for all the delays. Since I have been unable to get a response from the reviewers I assigned, I am switching reviewers now and we'll fast track the review from this point forward. @jeroen will complete a technical review soon, and I along with a domain reviewer will also share a review. Thanks for your patience. 🙏

Bisaloo commented 5 years ago

@karthik, thanks for the update! That sounds good.

jeroen commented 5 years ago

Hi Hugo,

Thanks for submitting your package to rOpenSci!

I'm not an expert in spectrometry hence my review focuses on general R packaging practices and some implementation advice.

Overall

Overall the package is in good shape. The pkgdown site looks nice and provides a good starting point for new users. All functions include an example using test data in various formats which ship with the package. The package passes CMD check with high testing coverage.

Improvements

My main concern is about validation of the parsing functions. The package implements custom readers for a range of binary and xml based spectrometry formats, but there is little documentation on where to find the specification of these formats (if one exists), and how you check that the output of the parsers is accurate.

Writing parsers is increadibly difficult and it is easy to overlook edge cases or odd properties of a format. I don't know how complex these formats are, but from the raw output data it is difficult to judge the correctness. The parsing unit tests verify that the output contains data and metadata, but do not verify the data itself.

You could improve the package by trying to show (where possible) that the data from your examples corresponds to what one would expect to see. For example you could extend the examples by plotting or modelling the output data, such that it can be visually inspected. As a dummy user, I would enjoy an intro vignette with a worked example of reading and modelling spectral data, and seeing a pattern appear. This would also help with illustrating how lightr is used in combination with other R packages to perform a complete analysis.

You mention that there is some overlap between lightr and other spectrometry R packages. For the formats that are also supported by another package, you could consdider adding a unit test to check if both packages return the same data. If only proprietary reference software is available, perhaps you could show summary or visual statistics that allow potential users to compare the output with the other software.

Summary

The package is in good shape for a first release and I recommend it is onboarded. It can be further improved with validation of the parsers and an vignettes with worked examples showing how lightr synergies with other packages.

I think more potential users are willing to give the package a try when you show how lightr fits into a complete analysis, and make it convincing that the results correspond to those of other (proprietary) software.

Bisaloo commented 5 years ago

Thank you for your review @jeroen! You bring up very valid points but I'm not yet quite sure if/how I can address all of them. I'll need a couple of days to think about it. Here is a first attempt at answering some of your concerns. I will also perform some changes on the packages following your recommendations whenever I get a moment (hopefully next week).

but there is little documentation on where to find the specification of these formats (if one exists)

To my knowledge, the only format that has been defined as a standard are jdx files. I do say it in the docs but I could try to think of a more prominent place where to put the reference:

https://github.com/Bisaloo/lightr/blob/b06c2b3c5512ff798c7ce82933c003bbdb4422bc/R/parse_jdx.R#L7-L9

The parsing unit tests verify that the output contains data and metadata, but do not verify the data itself.

You are correct that I don't compare the output to known results from an outside source but I however implemented regression tests to compare the output with that obtained from previous versions of lightr (https://github.com/Bisaloo/lightr/tree/master/tests/testthat/known_output)

You could improve the package by trying to show (where possible) that the data from your examples corresponds to what one would expect to see.

The main issue here is that other tools to extract this data, including proprietary tools discard a lot of information, as I mention in the README. The only way we know this data is correct is because we know how the measurements were made (i.e. we know the spectrometer model, the integration time, etc.) and how reflectance spectra should look for such an object. But I don't see a convenient way to demonstrate this to the user.

What I can do though is compare the output from:

for the same data (i.e. before and after export by official proprietary software). This is not straightforward because as mentioned previously, data is processed in an undocumented way while exported (metadata is partially discarded, spectral data is sometimes interpolated with an undocumented method, etc.).

For example you could extend the examples by plotting or modelling the output data, such that it can be visually inspected. As a dummy user, I would enjoy an intro vignette with a worked example of reading and modelling spectral data, and seeing a pattern appear. This would also help with illustrating how lightr is used in combination with other R packages to perform a complete analysis.

This is definitely a great idea and I will try to add this soon! An realistic example with measurements on one object and the results of the visual models (via pavo) next to a picture of this object could be a further way to convince the user that lightr at least produces data that is coherent with what we can observe with our naked eye. This would actually rely a bit more on the functionality from pavo than that from lightr but it doesn't matter.

For the formats that are also supported by another package, you could consdider adding a unit test to check if both packages return the same data.

Good point. I'll look into this.

If only proprietary reference software is available, perhaps you could show summary or visual statistics that allow potential users to compare the output with the other software.

This would be addressed in the intro vignette you propose.

I think more potential users are willing to give the package a try when you show [...] make it convincing that the results correspond to those of other (proprietary) software.

I'm wondering what would be the best way to convey this to the (potential) users. I doubt many users will dig into the tests folder. But I don't really see this as a vignette either. Maybe something derived from covrpage?


TODO:

Bisaloo commented 5 years ago

Hi @karthik, just wondering what's the status here.

I can try to address the remaining concerns from @jeroen in the next couple of weeks, by providing a new vignette which:

Would be that enough or should I wait for another review?

karthik commented 5 years ago

Hi @Bisaloo, It would be good for you to address these comments now. I'll be your second reviewer and will provide my review within a week. Once you address any additional issues, we can proceed with the next steps.

PS: I'll be away from email till August 6.

karthik commented 5 years ago

Adding to Jeroen's feedback, here is my review. Most comments are minor, and once I see a response, I can move quickly to get this accepted.

Review of lightr package

The lightr package provides the ability to parse spectrometry data from different file formats and from multiple different scientific instruments. The alternatives are all proprietary software.

General comments

Warning message:
No files found. Try a different value for argument "ext". 

Which makes sense. At least in the examples, can you point it to some example files that you can include with the package itself?

[1] "convert_tocsv"  "get_metadata"   "get_spec"       "parse_abs"     
 [5] "parse_generic"  "parse_jaz"      "parse_jazirrad" "parse_jdx"     
 [9] "parse_procspec" "parse_roh"      "parse_trm"      "parse_trt"     
[13] "parse_ttt"  

By adding some sort of standard prefix, e.g. lr_ it would allow for easy autocompletion in Rstudio and other text editors/IDEs. Functions like get_metadata are so generic that it will likely cause namespace conflicts with other packages.

Minor comments

Bisaloo commented 5 years ago

Thank you for your comments @karthik :+1:! please see my changes below:

General comments

  • In the package down documentation, the two main functions are impossible to test without any example files to read from. When running this in my console, I get:
Warning message:
No files found. Try a different value for argument "ext". 

Which makes sense. At least in the examples, can you point it to some example files that you can include with the package itself?

https://github.com/Bisaloo/lightr/commit/2beb31ea60f42ad2af8dcd1750767d4d18cf6d03

  • Test coverage for the package is great! :raised_hands:

  • In the function documentation, when describing specific instruments (e.g. OceanOptics, Avantes), link to those where possible.

https://github.com/Bisaloo/lightr/commit/22ee9164afde74348df2625101d6cc0d884e3ee1

Unless the package API is set in stone, I would recommend remaining some of the functions. Currently here is what is exported:

[1] "convert_tocsv"  "get_metadata"   "get_spec"       "parse_abs"     
[5] "parse_generic"  "parse_jaz"      "parse_jazirrad" "parse_jdx"     
[9] "parse_procspec" "parse_roh"      "parse_trm"      "parse_trt"     
[13] "parse_ttt"  

By adding some sort of standard prefix, e.g. lr_ it would allow for easy autocompletion in Rstudio and other text editors/IDEs. Functions like get_metadata are so generic that it will likely cause namespace conflicts with other packages.

Would you do this for all exported functions or just the three main ones which are the most likely to have name conflicts (get_spec(), get_metadata() and convert_tocsv())?

Minor comments

  • Version number still shows as Version: 0.0.0.9000. You might want to bump the minor version at least with each major change so it's evident that the package is not super early in development. Leaving the version number at the devtools default does act as a signal.

https://github.com/Bisaloo/lightr/commit/b9159922db33c7b19c31de4c7155f2ae9606d82a

  • The url field has two URLs: URL: https://bisaloo.github.io/lightr, https://github.com/Bisaloo/lightr Perhaps link to just one? Maybe the package down docs (which then link to the repo itself) so that the field is easy to parse for other purposes. @jeroen do you have other thoughts here on how it might impact the docs parser?

Actually, the recommended practice for pkgdown is to have both URL. The github.com URL is required to auto-link to the github repo and fill out the bug report field (in the sidebar and the github logo in the top navbar). The github.io is required for cross-linking from other packages (https://pkgdown.r-lib.org/articles/linking.html#across-packages).

  • "If you can't find the best parse fror your specific file" README.md:119. fror should be for. You can check spelling on the whole package with spelling::spell_check_package()

https://github.com/Bisaloo/lightr/commit/515d193373b35d1faddc117c6606a97ca7a32c74

karthik commented 5 years ago

Would you do this for all exported functions or just the three main ones which are the most likely to have name conflicts (get_spec(), get_metadata() and convert_tocsv())?

Generally for all exported functions unless you have a strong reason not to.

Bisaloo commented 5 years ago

Ok, thanks! Done in https://github.com/Bisaloo/lightr/commit/a0ed29867f3bc8d36b92f5b22f58c32de936ccab.

karthik commented 5 years ago

Thanks @Bisaloo! I'll do a run through tomorrow before I sign off.

karthik commented 5 years ago

To run the examples in your README, you'll need a slight change to locate the inst/testdata/ files

lr_get_metadata(system.file("testdata/procspec_files" , package = "lightr"),
                ext = "ProcSpec")

Same for the example below.

Bisaloo commented 5 years ago

Right, it makes more sense since users have to install the package to run this anyways :+1:. It's done!

karthik commented 5 years ago

Congrats @, your submission has been approved! 🎉 Thank you for submitting and to @jeroen for a thorough and timely review. To-dos:

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Welcome aboard!

karthik commented 5 years ago

@Bisaloo Once you transfer, the documentation will also be auto generated at docs.ropensci.org/lightr

You'll need to update that URL in your DESCRIPTION (and the top of your repo) as well after the transfer.

Bisaloo commented 5 years ago

It's done. Thanks @karthik and @jeroen for your helpful comments!

karthik commented 5 years ago

Thanks @Bisaloo! :pray:

karthik commented 5 years ago

Thanks @Bisaloo! :pray:

Bisaloo commented 5 years ago

Should I go ahead and submit this to JOSS once I add a DOI?

karthik commented 5 years ago

You can add a DOI after the JOSS submission is accepted. So go ahead and submit there. In the meantime, you can compile your own paper here: http://whedon.theoj.org/ and fix any mistakes or issues such as broken references.

Also reading over your current paper, you provide no examples of a scientific application. That will be necessary for this software to get accepted to JOSS.

Bisaloo commented 5 years ago

Also reading over your current paper, you provide no examples of a scientific application. That will be necessary for this software to get accepted to JOSS.

Okay, thanks for the advice! I'll do this tomorrow then.

karthik commented 5 years ago

If it helps you can read over some accepted JOSS papers about R packages: https://joss.theoj.org/papers/in/R to get a sense of what level of detail is needed there. I'm also an editor at JOSS and can provide feedback if necessary.

jeroen commented 5 years ago

Docs are now live on https://docs.ropensci.org/lightr/

Bisaloo commented 5 years ago

@karthik, I had a look at the most recent R papers and they don't seem too provide many more examples of applications. Do you have something specific in mind that would make a valuable addition?

karthik commented 5 years ago

I think it would be useful to use a test dataset + lightr to show what can be done as a small example. Here is an example. If you submit as is, that's the first question an editor will as you about.