ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

rebi - an R-Client for Europe PMC #29

Closed njahn82 closed 8 years ago

njahn82 commented 8 years ago

rebi gives access to Europe PubMed Central, an indexing service for life- science publications that is provided by the European Bioinformatics Institute (EBI). This client can be used to search metadata and full-texts, retrieve reference sections, citations, text-mined terms, and links to other EBI databases or external sources like the Wikipedia.

Package: rebi
Version: 0.0.999
Maintainer: 'Najko Jahn' <najko.jahn@uni-bielefeld.de>
Author: Najko Jahn
License: GPL-3
Title: rebi -- R Interface for Europe PMC RESTful Web Service
URL: http://github.com/njahn82/rebi/
BugReports: http://github.com/njahn82/rebi/issues
Description: R Interface to Europe PMC RESTful Web Service. Europe PMC covers
    life science literature and it gives access to open access full texts.
    Coverage is not only restricted to Europe, but articles and abstracts are
    indexed from all over the world. As a partner in the PMC International
    (PMCi), Europe PMC ingests all PubMed content and extends its index with
    other sources, including Agricola, a bibliographic database of citations to
    the agricultural literature, or Biological Patents. In addition to
    bibliographic data, rebi gives automated access to citations and references
    that were indexed by Europe PMC. Links between life-science literature and
    other EBI databases, including ENA, PDB or ChEMBL are also accessible.
    External links provide a mechanism to link out to external providers such as
    Wikipedia or research data repositories.
LazyLoad: yes
LazyData: yes
VignetteBuilder: knitr
Depends:
  R (>= 3.00)
Imports:
    httr,
    jsonlite,
    plyr,
    xml2
RoxygenNote: 5.0.1
Suggests:
    testthat,
    knitr,
    rmarkdown

https://github.com/njahn82/rebi

It only works with Europe PMC's Articles RESTful API. https://europepmc.org/RestfulWebService

Life-science researchers and students, scholars that study the life sciences, librarians.

To my knowledge, this is the only package that implements Europe PMC's Articles RESTful API. Since there is a considerable overlap with PubMed/PubMed Central, the rentrez package could be used as well to fetch bibliographic information. fulltext package gives access to supplementary material deposited in Europe PMC. oai package could be used to retrieve metadata from Europe PMC via its OAI-PMH interface.

This package has been available through rOpenSci since summer 2013. Because I made major upgrades in the last days that reflect the rOpenSci guidance on how to write packages, I thought it might be helpful to re-submit this package.

See also the discussion in the forum: https://discuss.ropensci.org/t/major-package-update-of-rebi-and-question-regarding-re-submission/333/

noamross commented 8 years ago

Reviewers: @toph-allen

sckott commented 8 years ago

@toph-allen - hey there, it's been 16 days, please get your review in by Apr 01, thanks :smiley_cat: (ropensci-bot)

toph-allen commented 8 years ago

Here's my review. N.B. According to the submission above, the package has been available through rOpenSci since 2013, but has recently made major upgrades that reflect rOpenSci guidance on package authorship, so it probably is already sufficient. Let me know if you have any questions, or any areas where you'd like me to clarify more.

Installation

General comments on code

Comments on specific functions

Improvements to code style?

Comments on documentation?

Do tests pass locally?

Documentation and examples run without issue

ROpenSci Packaging Guidelines

(I spent about 1.5–2 hours on the review).

noamross commented 8 years ago

Thanks for the review @toph-allen!

@njahn82 Let us know when you've incorporated / responded to @toph-allen's comments and we can merge the updated version of rebi into the rOpenSci repo.

njahn82 commented 8 years ago

Thank you for the review. I will incorporate the suggested improvements in the next days.

njahn82 commented 8 years ago

Hi @toph-allen, sorry for coming back to you after such a long time. Thank you again for your helpful comments and suggestions to improve my package. I have a new version ready that include your remarks. I'll go over them in the following.

Please let me know if you have any further comments.

epmc_ftxt() includes a data_src = "pmc" argument, but the function is hard-coded to always use PMC as a source--this argument is only used to throw an error if anything other than PMC is used. Perhaps better to remove this argument and clarify in documentation, as using args(epmc_ftxt) might sugget to a user that they can use another source.

As suggested, I removed the data_src = "pmc" argument from the function epmc_ftxt()

https://github.com/njahn82/rebi/issues/2

There are a few local package variables in use: supported_dbs and supported_semantic_types are used in one function each, and are defined in that function's file; supported_data_src is used in many functions, but defined in epmc_details.r. I'm not sure if there's an accepted best practice for this, but it seems it'd make more sense to either put these inside functions or put them in their own file, say supported.r.

Agreed. I moved functions that are commonly used to utils.r. Thanks to your suggestion, I also started to dry out my code by

While code itself is self-explanatory enough that I don't think this is a major issue, sometimes comments are a little sparse or terse (i.e. just a word or two).

Agreed. Added more comprehensive explanations in some places.

The top-level package help file is rebi-package.r. Seems like it would be better to have this as rebi.r and change @name rebi-package to @name rebi, so that ?rebi brings up that document.

I moved rebi-package.r to rebi.r

There are one or two places where documentation could use a touch of clarifying. E.g. epmc_citations.r could be clearer as to whether it fetches works cited by a publication vs. works citing a publication.

As suggested, I clarified both functions. In the documentation of epmc_refs() it says now:

#'  Get references for a given publication
#'
#'  This function retrieves all the works listed in the bibliography of a given
#'  article.

epmc_citations()

#' Get citations for a given publication
#'
#' The function finds works that cite a given publication.

Package naming: It's a fine name, though I don't know its provenance.

It comes from the acronym European Bioinformatics Institute (EBI), which hosts Europe PMC. Do you think europepmc as package name would be more appropriate since the EBI offers many other services that are not covered by this package?

Function/variable naming All functions are prefixed with "epmc_", which seems unnecessary?

I followed the example of other rOpenSci packages like rcrossref or aRxiv. Is there any best practise available if and when to prefix functions?

Again, many thanks for your helpful comments and suggestions.

noamross commented 8 years ago

Quick editor's comment about function naming: We recommend prefixing (e.g, epmc_) to prevent conflicts and ease autocompletion when function names are likely to conflict with other packages. In this case, this is certainly the case, as search(), ref(), etc., will conflict with other packages and even the base R namespace.

noamross commented 8 years ago

@toph-allen - Please let us know if you have responses to the changes, thanks!

toph-allen commented 8 years ago

I'm so sorry I took even longer to respond to your changes—I had been budgeting more time to go through it than it actually took.

Reading through your responses to my suggestions, and looking through the accompanying changes to the code, I don't think I have any further issues.

Regarding the naming of your packages:

Do you think europepmc as package name would be more appropriate since the EBI offers many other services that are not covered by this package?

I think that renaming the package europepmc or epmc would make its name more obviously connected with its main functions, and with the function prefixes. Those would be concrete benefits over the current name. It might be a bit of a hassle (or it might not be more than a "find and replace" and some touch-up)—and, you might just be fond of the current name. I'll leave that decision up to you.

All seems in order to me!

njahn82 commented 8 years ago

Great, I would prefer to rename the package to europepmc as I have not initially submitted it to CRAN yet, but plan to do this within the next days.

Many thanks for reviewing this package!

noamross commented 8 years ago

Thanks, @toph-allen for reviewing and @njahn82 for keeping up this package!

@njahn82 I am getting one local error for a test that is skipped on Travis:

1. Failure (at test_epmc_citations.r#22): epmc_citations returns ---------------
epmc_citations("13814508") does not match 'No citing documents found'. Actual value: "Error in rebi_GET(path = path) : Not Found (HTTP 404).\n"

Once that's fixed, and you've made all the internal changes for changing the name, I think we're good to go. You should be able to apply your changes to the package in the ropensci repo via a pull request. I'm not sure about whether you have the ability to change that repo's name. @sckott: heads up, we may need to see that a name change populates through our systems properly.

njahn82 commented 8 years ago

Thanks for letting me know. I realized that the Europe PMC API has started to give a 404 when no citations were found, which is not a nice behavior.

Compare http://www.ebi.ac.uk/europepmc/webservices/rest/MED/13814508/citations with http://www.ebi.ac.uk/europepmc/webservices/rest/MED/9843981/citations

I will fix the error messages and corresponding tests.

sckott commented 8 years ago

we may need to see that a name change populates through our systems properly.

okay, not sure how to handle this when there's a fork and original version - I'll look into this

njahn82 commented 8 years ago

I prepared the package for name change.

@sckott To get rid of the fork, I would prefer to transfer repository ownership to ropensci first. Then we could do the repository name change.

What do you think?

sckott commented 8 years ago

Okay, sounds good!

sckott commented 8 years ago

name changed, https://github.com/ropensci/europepmc

njahn82 commented 8 years ago

Great, all references to the former package name should now be changed to europepmc.

Many thanks for all your comments and suggested improvements, I really appreciate your help.

ropensci-review-bot commented 2 years ago

Checks for riem (v0.3.0.9000)

git hash: a4501ec8

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 2)


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 4 files) and - 1 authors - 2 vignettes - no internal data file - 7 imported packages - 3 exported functions (median 15 lines of code) - 9 non-exported functions in R (median 9 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 4| 28.3| | |files_vignettes | 2| 85.7| | |files_tests | 12| 92.5| | |loc_R | 107| 12.0| | |loc_vignettes | 120| 31.3| | |loc_tests | 1984| 94.1| | |num_vignettes | 2| 89.2| | |n_fns_r | 12| 16.1| | |n_fns_r_exported | 3| 12.9| | |n_fns_r_not_exported | 9| 20.1| | |n_fns_per_file_r | 2| 25.6| | |num_params_per_fn | 1| 1.6|TRUE | |loc_per_fn_r | 12| 35.4| | |loc_per_fn_r_exp | 15| 35.6| | |loc_per_fn_r_not_exp | 9| 27.1| | |rel_whitespace_R | 26| 22.6| | |rel_whitespace_vignettes | 66| 55.5| | |rel_whitespace_tests | 1| 28.1| | |doclines_per_fn_exp | 21| 15.7| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 7| 27.1| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/ropensci/riem/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/riem/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:-----------|:----------|:------|:----------| |R-CMD-check |success |a4501e |2022-02-08 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking package dependencies ... NOTE Package suggested but not available for checking: ‘forecast’ #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 17 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 17


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.88 | |pkgcheck |0.0.2.237 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.