ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Submission: infx #218

Closed nbenn closed 6 years ago

nbenn commented 6 years ago

Summary

Package: infx
Type: Package
Title: OpenBIS API Access to the InfectX Data Repository
Version: 0.1.0
Authors@R: person("Nicolas", "Bennett",
                  email = "nicolas.bennett@stat.math.ethz.ch",
                  role = c("aut", "cre"))
Description: The Open Source Biology Information System (openBIS) is a
   general purpose framework for management, annotation and publication
    of large data sets that arise from biological experiments. By making
    the JSON-RPC based openBIS API available to R, image-based high
    throughput screening data as generated by the InfectX/TargetInfectX
    projects can be browsed, searched and downloaded directly from R.
    Currently, several kinome-wide RNA interference screens performed on
    HeLa cells in presence of a selection of bacterial and viral pathogens
    and using oligo libraries form multiple vendors are available. Further
    genome-wide screens are forthcoming. The full data obtained from these
    experiments is accessible, including raw microscopy images, object
    segmentation masks, single cell feature data generated by CellProfiler
    and infection scoring data, alongside rich meta data and quality
    control data.
URL:
    https://nbenn.github.io/infx,
    http://www.infectx.ch,
    https://www.targetinfectx.ch,
    https://openbis.elnlims.ch
BugReports: https://github.com/nbenn/infx/issues
Depends:
    R (>= 3.1.0)
Imports:
    R.matlab,
    jsonlite,
    assertthat,
    curl,
    progress,
    crayon,
    magick,
    base64enc
License: GPL-3 | file LICENSE
Encoding: UTF-8
LazyData: true
Suggests: 
    testthat (>= 2.0.0),
    knitr,
    rmarkdown,
    rvest,
    tibble,
    ggplot2,
    RColorBrewer
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.0.1.9000
VignetteBuilder: knit

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

sckott commented 6 years ago

thanks for your submission @nbenn

doing editor checks now, and haven't seen any documentation in your package on how to create an account? It seems one can create an API token with login_openbis, but how do you get a username and password? Would be good to add this info to the package as well.

nbenn commented 6 years ago

@sckott, sorry for the delayed response on my side.

There is currently no way for a user to create an account him/herself. Accounts can only be created by the service admins. All publicly available data can be accessed using publicly released login credentials. To make this more straightforward, I added these public login credentials as default arguments to login_openbis() such that

tok <- login_openbis()

now yields a valid login token associated with this public user account.

sckott commented 6 years ago

thanks @nbenn - will have a look with the open login

sckott commented 6 years ago

Editor checks:


Editor comments

Thanks for your submission @nbenn !!

* checking tests ...
  Running ‘test-1-utils.R’ [7s/30s]
  Running ‘test-2-objects.R’ [37s/95s]
  Running ‘test-3-datasets.R’ [22s/130s]
  Running ‘test-4-features.R’ [12s/36s]
  Running ‘test-5-images.R’ [11s/169s]
  Running ‘test-6-files.R’ [16s/94s]

I have no idea what the time consuming parts are of the test suite, but if they are the HTTP requests you could explore mocking or caching - so funny 😆 i was just about to suggest trying webmockr with curl and remembered you opened this issue https://github.com/ropensci/webmockr/issues/40 - anyway, I'll try to get that webmockr/curl integration done soon.

[![](https://badges.ropensci.org/218_status.svg)](https://github.com/ropensci/onboarding/issues/218)

── GP infx ────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 98% of code lines are covered by test cases.

    R/dataset.R:515:NA
    R/feature.R:153:NA
    R/file.R:288:NA
    R/image.R:174:NA
    R/json-base.R:110:NA
    ... and 13 more lines

Seeking reviewers now 🕐


Reviewers:

ottlngr commented 6 years ago

Sorry for the inconvenience, but I have to postpone my review for 2 more days. I will have finished it at 2018-07-12.

sckott commented 6 years ago

no problem, thanks @ottlngr

ottlngr commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 3,5


Review Comments

First of all I have to state that infx is implemented in an exemplary manner. Though I have absolutely no biological background and having a hard time to imagine what the data infx gives access to actually is, I could easily follow the idea and structure of the package and the procedures it provides.

With detailed documentation and several vignettes, the potential user is guided from the very first steps until mastering the package.

While I let the check marks in the reviewer template speak for themselves, I will elaborate on some (minor) issues I came across.

Documentation

infx primarily seems to address academic users. The academic flair of the package is further emphasized by the fact the development of the package was funded. Nevertheless I would have enjoyed one or two sentences explaining purpose of the package

Surely, this is the opinion of someone fulfilling both these criteria -but maybe it is an valuable input for the author.

Running devtools::spell_check() returns some, even if few typos and different spellings of certain terms.

Functionality

As stated earlier, I think infx accomplishes the promised tasks in a remarkable manner.

Only one thing, which I always try to eliminate when writing a package or functions dealing with APIs, attracted my attention:

token <- login_openbis()
projects <- list_projects(token)
adeno_exps <- list_experiments(token, projects[[1L]])

As a user I find it redundant to supply the token to list_experiments(), when already supplying projects which was received using the token. This might be a cosmetic issue, but maybe there is a way to store the token within the respective objects, such as projects.

Furthermore infx uses curl to handle HTTP requests. Maybe the httr package can be used to meet the recommendation of rOpenSci.

I congratulate @nbenn on a valuable contribution to the R community and hope, my review is felt as, even if from a non-specialist, valuable input.

sckott commented 6 years ago

thanks for your review @ottlngr !

@tdjames1 friendly reminder your review due in a little over 1 week

nbenn commented 6 years ago

@ottlngr: thank you for your kind words.

I fixed the missing community guidelines and repo badges and added a codemeta description. I also fixed some spelling errors using devtools::spell_check(). I didn't know about this, thank you for the tip!

As for the ImageMagick dependency, I added it as SystemRequirements field in the DESCRIPTION file. But is this really what one should do? After all, it is not a dependency of infx, but of magick. The same goes for a configure script: I'm happy to copy the relevant parts from magick but this only leads to the same checks being run twice. @sckott, how do you feel about this? Is it customary to list and/or check for downstream system dependencies?

I tried to give a some more context to this package in the beginning of the readme. Does this help a bit for people coming from other fields?

I did consider using httr, but as far as I understand, httr doesn't support async requests (r-lib/httr#271). This is why I opted for curl instead.

Finally, I like your suggestion regarding token handling. I could easily attach the used token as token attribute to objects returned from the API. I'd need some additional logic to handle subsetting, concatenating etc. of objects. Also expired tokens need to be handled. I'll think about this.

Thank you for your feedback @ottlngr.

ottlngr commented 6 years ago

As for the ImageMagick dependency, I added it as SystemRequirements field in the DESCRIPTION file. But is this really what one should do? After all, it is not a dependency of infx, but of magick. The same goes for a configure script: I'm happy to copy the relevant parts from magick but this only leads to the same checks being run twice.

Indeed. The imported magick actually does handle the system dependency very well.

I tried to give a some more context to this package in the beginning of the readme. Does this help a bit for people coming from other fields?

It does, yes!

I did consider using httr, but as far as I understand, httr doesn't support async requests (r-lib/httr#271). This is why I opted for curl instead.

That is totally fine as long there's a specific reason.

Finally, I like your suggestion regarding token handling. I could easily attach the used token as token attribute to objects returned from the API. I'd need some additional logic to handle subsetting, concatenating etc. of objects. Also expired tokens need to be handled. I'll think about this.

I think that would be cool, but I also see the necessary effort. Maybe just keep it in mind for a future release.

I added the final check marks in the reviewer template!

sckott commented 6 years ago

@nbenn we meant that part about system dependencies only for the package that is using that dependency (e.g. package foo), and not intended for packages using package foo

tdjames1 commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 8


Review Comments

infx is an interface enabling access to openBIS structured experimental data collected as part of the InfectX project. At the core of the package is an interface to the JSON based openBIS API. Further functionality facilitates access to data structures used in the InfectX experimental data. As noted in documentation, the use of the package is not restricted to the InfectX openBIS instance, but the functionality/API coverage is geared up to that instance.

I'm not familiar with how other instances of the openBIS differ from the InfectX instance. I wondered if there may be a case to split this into two packages, one containing core functionality and another to provide the specific functionality required for the InfectX project data. However, if the InfectX functionality would be useful in other openBIS instances then a single package that can be extended probably makes more sense.

Overall, the package is well designed and all functions seem to work robustly when tested using the examples provided in the README and vignettes (I'm not familiar with the data structures so I have not tested the functionality much beyond this). Caveats, such as openBIS functionality not covered or inconsistencies in the openBIS API, are well documented in function documentation and openBIS API vignette.

The following comments are for the most part clarifications to documentation and notes on ROpenSci packaging guidelines. I started my review prior to the commits made in response to @ottlngr's review so apologies if I've made a note of anything already fixed!

Tests

Tests ran successfully but goodpractice::gp() complains about coverage:

── GP infx ─────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 56% of code lines are covered by test cases.

    R/dataset.R:160:NA
    R/dataset.R:161:NA
    R/dataset.R:162:NA
    R/dataset.R:163:NA
    R/dataset.R:164:NA
    ... and 798 more lines

The highlighted lines are within S3 methods e.g. list_datastore_urls.character. I have not been able to find any advice about what to do in the case of testing S3 methods so I don't know if you need to duplicate tests for all overloaded methods. Perhaps @sckott can advise here.

README

The amendments to the README that @nbenn made in response to @ottlngr's review are a great improvement.

Vignettes

N.B. I reviewed the online vignettes (https://nbenn.github.io/infx/articles/infx-intro.html etc).

I found the vignettes extremely helpful in guiding me through the functionality of the package as it relates to the data structures accessed through openBIS.

Introduction to infx

OpenBIS API coverage

JSON object handling

Function documentation

All functions should document the type of object returned under the @return heading.

We recommend using the @family tag in the documentation of functions to allow their grouping in the documentation of the installed package.

infx

As is the case for a README, top-level documentation or vignettes may be the first point of entry for users. If your package connects to a data source or online service, or wraps other software, it should provide enough information for users to understand the nature of the data, service, or software, and provide links to other relevant data and documentation.

print.json_class

json_vec

list_datasets

list_files

list_image_metadata

list_materials

list_samples

make_requests

search

list_datastores

CRAN gotchas from ROpenSci package guidelines

sckott commented 6 years ago

thanks so so much @tdjames1 ! Do you have an estimate on number of hours it took to review?

sckott commented 6 years ago

@nbenn reviews are both in now, continue discussion here. let me know if you have any questions about process

tdjames1 commented 6 years ago

@sckott quite a few... I've updated my review comment with an estimate.

sckott commented 6 years ago

thanks @tdjames1 - we're keeping track of this so we can make review times shorter for reviewers moving forward

nbenn commented 6 years ago

@tdjames thank you very much for for the time you took for your in-depth feedback. I fixed most of the issues you found, except for the following, where I either disagree, or do not understand.

Re Review Comments

I considered advertising my package more as an openBIS API package, rather than an InfectX data access package, but as the InfectX data set is (afaik) the only instance of large-scale, publicly accessible data under openBIS, I find the data access more attractive than the API capabilities per-se. In principle, nothing in the package is specific to InfectX (with exception of the default values in login_openbis() and examples used in docs). Personally, I feel the focus on InfectX data is ok, as long as no other openBIS instances of comparable scope start showing up. Or do you have strong feelings otherwise?

Re Tests

Test coverage is 98%, when I run tests locally or on Travis (also see sckott's gp output). I don't understand why your gp output reports only 56% coverage. However I don't think low test coverage is an issue here. If anything, I have too many tests that result in lengthy runtimes. Therefore the problem I'd rather look into, is reducing test runtime using @scott's mocking infrastructure. Unfortunately I haven't gotten to this yet.

Re README

Re Vignettes

Re Function documentation

Re CRAN gotchas from ROpenSci package guidelines

sckott commented 6 years ago
nbenn commented 6 years ago

@sckott, sorry for the delay. I started writing up a lengthy answer, explaining what I meant regrading the @return tags. In the process of doing so, I decided on simply adding the @return sections instead. I'll get back to you when this is done.

tdjames1 commented 6 years ago

Hi @nbenn, apologies for the delay in responding to your queries/comments. Please find my responses below.

Re Review Comments

Personally, I feel the focus on InfectX data is ok, as long as no other openBIS instances of comparable scope start showing up. Or do you have strong feelings otherwise?

That's fine, I felt that I should raise it as an issue but I don't feel strongly that it is a big problem.

Re Tests

Test coverage is 98%, when I run tests locally or on Travis (also see sckott's gp output). I don't understand why your gp output reports only 56% coverage.

I think perhaps something is not set up correctly on my local machine. 98% seems pretty decent coverage!

Re README

typhimurium is romanized on purpose, as typhimurium does not specify a species but a serovar

OK! I didn't realise that.

In light of what I tried to elaborate a bit above, what do you think needs changing here?

"This R package provides access to the openBIS JSON-RPC API..." - append sentence to the effect that it's specifically applied to InfectX data?

Implementation wise, nothing is specific to InfectX data, it is more that the package presentation is built around this dataset.

I think I got a different impression when I was reading some of the documentation, i.e. that in InfectX data is structured in a certain way which could be different in other instances. You could perhaps append something like "with specific reference to the InfectX dataset" to clarify the relationship between the package, openBIS and InfectX.

link "[1]" to footnote text not active

I tried to fix this, but unfortunately I think it's not possible, as there is no proper support for footnotes in gfm

That's a shame!

Re Vignettes

A login token is created for demonstrating make_request() and make_requests(). This login token is destroyed because the following example is a demonstration of how to access a non-InfecX openBIS instance. Are you suggesting to create individual login tokens for both the make_request() and make_requests() examples?

Yes, and I apologise for not being more explicit about this. In the vignette, each code block has a little "copy to clipboard" helper to make it easy for a reader to grab the code and paste it into R. But that also makes it easy to copy the first block which demonstrates make_request() but doesn't include the logout_openbis() call. That's why I thought it might be better if each block was complete in itself.

Re Function documentation

For me, both on the pkgdown website and in the R Documentation pane of RStudio, list_plates(), etc. in ?list_samples are clickable links. I think that under roxygen2 with markdown support enabled, the correct syntax is [list_plates()] which is rendered to \code{\link[=list_plates]{list_plates()}}. Am I missing something here?

It's displaying correctly for me now, so I'm not sure what happened there. Possibly another local configuration issue.

Re CRAN gotchas from ROpenSci package guidelines

Do you have reason to believe my check_skip() function in tests/testthat/setup-functions.R will not work properly on cran?

Apologies, I missed the check_skip() function. Looks good.

nbenn commented 6 years ago

Due to a hardware failure, the InfectX openBIS server is currently down. I was promised to be informed of any developments but also told, it may take some time to get things back online. I'm sorry for the delay this causes.

sckott commented 6 years ago

thanks for the heads up

nbenn commented 6 years ago

Ok, the InfectX openBIS service is back online.

@sckott I added @return tags to the docs.

@tdjames1 I think, I now incorporated all the changes you suggested.

tdjames1 commented 6 years ago

@nbenn Thanks, I've checked the remaining boxes on my review post (cc @sckott).

sckott commented 6 years ago

Great, i'll take a final editor's check

sckott commented 6 years ago

@nbenn is master branch the one I should look at?

nbenn commented 6 years ago

yes.

Sent from my iPad

On 2 Aug 2018, at 19:44, Scott Chamberlain notifications@github.com wrote:

@nbenn is master branch the one I should look at?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

sckott commented 6 years ago

Approved! Thanks again for your submission @nbenn !


nbenn commented 6 years ago

@sckott @tdjames1 @ottlngr Thank you for taking the time to review and help me improve infx.