ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

Submission: rromeo #285

Closed Rekyt closed 5 years ago

Rekyt commented 5 years ago

Submitting Author: Matthias Grenié (@Rekyt)
Repository: https://github.com/Rekyt/rromeo Version submitted: 0.1.0 Editor: @sckott
Reviewer 1: @ottlngr
Reviewer 2: @brunaw
Archive: TBD
Version accepted: TBD


Package: rromeo
Type: Package
Title: An R Client for SHERPA/RoMEO API
Version: 0.1.0
Authors@R: c(
  person("Matthias", "Grenié", email = REDACTED,
         role = c("aut", "cre"), comment = c(ORCID = "0000-0002-4659-7522")),
  person("Hugo", "Gruson",
         role = "aut", comment = c(ORCID = "0000-0002-4094-1476"))
  )
Description: Provides an R Client for SHERPA/RoMEO API
  <http://www.sherpa.ac.uk/romeo/apimanual.php> which indexes policies of
  journal regarding the archival of scientific manuscripts before and/or after
  peer-review as well as formatted manuscripts.
License: GPL-3
Encoding: UTF-8
LazyData: true
Imports: 
    httr,
    xml2
RoxygenNote: 6.1.1
Roxygen: list(markdown = TRUE)
Suggests:
    ggplot2,
    knitr,
    rmarkdown,
    testthat,
    vcr (>= 0.2.6),
    ISOcodes,
    mockery
Language: en-US
VignetteBuilder: knitr
URL: https://github.com/Rekyt/rromeo
BugReports: https://github.com/Rekyt/rromeo/issues

Scope

This package is an R wrapper to the SHERPA/RoMEO API available online. It provides direct access to the API through R functions.

Specifically bibliometricians interested in retrieving data on archival policies of different journals. rromeo can be use to draw a quick portrait of the archival policies of a given scientific field, or to show differences among publisher practices.

To our knowledge there are no R packages providing access to this API.

https://github.com/ropensci/software-review/issues/273

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options - [ ] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

sckott commented 5 years ago

thanks for your submission @Rekyt assigning an editor soon

sckott commented 5 years ago

Editor checks:


Editor comments

Thanks for your submission @Rekyt !

Here's the output from goodpractice. If you haven't used goodpractice it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started. The test coverage is already high, nice job.

── GP rromeo ───────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 96% of code lines are covered by test
    cases.

    R/rromeo_publishers.R:199:NA
    R/rromeo_publishers.R:200:NA
    R/rromeo_publishers.R:202:NA
    R/rromeo_publishers.R:265:NA
    R/rromeo_publishers.R:267:NA
    ... and 3 more lines

  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and
    it is easier to read your code for them if you use '<-'.

    R/rromeo_base.R:3:13
    R/rromeo_base.R:7:13
    R/rromeo_base.R:20:16
    R/rromeo_base.R:21:12
    R/rromeo_journals.R:20:17
    ... and 119 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about
    80 characters wide. Try make your lines shorter than 80 characters

    R/utils.R:98:1

Seeking reviewers now 🕐


Could you please add the rOpenSci under review badge to your README?

[![](https://badges.ropensci.org/285_status.svg)](https://github.com/ropensci/onboarding/issues/285)

Reviewers:

sckott commented 5 years ago

reviewers assigned

ottlngr commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 6


Review Comments

First of all, rromeo is a well-made package and very close to a positive review.

When reviewing a package, I always start with the vignette. The vignette that comes along with rromeo made a positive impression on me, as it summarizes the major usage of the package and offers detailed examples on the different query options as well as plotting examples. Sole point of criticism: Some of the outputs are too long and should be shortened for the rendered version of the vignette.

Packaging guidelines

Why I did not tick off **Packaging guidelines**: The package conforms to the rOpenSci packaging guidelines yet:

Miscellaneous
Usage

I run the examples from the vignette and worked through the package this way. Here a few things I want to address explicitly:

Conclusion

As soon as the author has improved the output of goodpractice, all formal requirements for a positive review are fulfilled.

Nevertheless, I hope that I could provide some helpful input on the package, especially from a user's sight. Regarding my quite specific "change requests" #1 and #2, I'm curious about the author's opinion.

sckott commented 5 years ago

thx for your review @ottlngr

@brunaw reminder that your review is due next Wed

brunaw commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5


Review Comments

Error in rromeo:::parse_generic(api_answer) : object 'parsed' not found

I opened the source code in the function and found that my current configuration was:

apicontrol [1] "all" outcome [1] "publisherFound" hits [1] "3007"

This error was probably generated by the lack of an else statement in the function to define what happens when the previous conditionals were not satisfied. parse_publisher() worked for this configuration, but the way the conditionals are written doesn't allow it to return a result.

Q: Does the code comply with general principles in the Mozilla reviewing guide? A: Yes.

Q: Does the package comply with the rOpenSci packaging guide? A: Yes.

Q: Are there improvements that could be made to the code style? A: Reviewing the way the if-else statements are organized. One idea is using more %in% to avoid extensive | conditions and make sure there are conditions for all the possible cases.

Q: Is there code duplication in the package that should be reduced? A: Only documentation duplication (the API key part).

Q: Are there user interface improvements that could be made? A: No, the package looks very user-friendly and has plenty of useful messages.

Q: Are there performance improvements that could be made? A: No.

Q: Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient? Does it use the principle of multiple points of entry i.e. takes into account the fact that any piece of documentation may be the first encounter the user has with the package and/or the tool/data it wraps?

A: Yes.

Q: Were functions and arguments named to work together to form a common, logical programming API that is easy to read, and autocomplete?

A: Yes.

Conclusion

rromeo looks like a good package and will be even better with some small improvements. I hope my comments are clear, so let me know if they aren't.

(thanks for the reminder, @sckott!)

sckott commented 5 years ago

thanks for your review @brunaw

all reviews in

Rekyt commented 5 years ago

We would like to thank @ottlngr and @brunaw for the time they took reviewing rromeo as well as their useful comments and suggestions.

As noted by both reviewers and the editor @sckott, rromeo didn't pass the checks from goodpractice it does now with Rekyt/rromeo@1ce7107 we've replaced all the = with <- as it is a more standard practice in the R community.

We'll first respond to @ottlngr comments:

the missing or incorrect top-level documentation for rromeo which should be addressable by ?rromeo. See http://r-pkgs.had.co.nz/man.html for a reference documentation.

rromeo now has a good package-level documentation Rekyt/rromeo@184c955

It's not an official guideline, but when implementing an API package, I always consider Hadley Wickham's best practices. One of them is to set a User Agent which refers to the use of your package. No necessity, but worth consideration.

We hadn't thought of adding a user-agent which is good practice and we thank @ottlngr for this great suggestion now implemented in Rekyt/rromeo@3b2afd5 The user-agent is now added as header to all queries.

Regarding the API key: You provide the key argument in functions when needed, while not recommending to use that way to provide the API key. My opinion: If you do not recommend this way, do not offer it ;) And I would also not recommend it, and therefore I would recommend you to remove the key argument in all functions. Instead I would (my personal preference) reconstruct rromeo to always access the environment variable SHERPAROMEO_KEY. For abstraction and not forcing users to set the environment variable manually, I would introduce some kind of rr_auth(key) function, that stores the key in an environment variable. (#1)

For development reasons we still want to keep a key argument in all functions. However we now made an entire vignette dedicated to setting up the API key and the various means to achieve it. Rekyt/rromeo@967322a to Rekyt/rromeo@b45f218 see addition with https://github.com/Rekyt/rromeo/compare/58fb140..b45f218

We also now created an rr_auth() function which helps the user defining the API key. see Rekyt/rromeo@3c87b22

When using jj_journal_issn() with a malformed ISSN, I get the cryptic error Error in FUN(X[[i]], ...) : ISSN is invalid, please check the format and not the expected Error in jj_journal_issn("foo") : ISSN is invalid, please check the format. On the fly, I could not figure out why this happens.

There was a missing call. = FALSE in the call to stop it is now fixed Rekyt/rromeo@b7b81c4

When using rr_journal_name("Biostatistics", qtype = "contains") I get a message stating that 17 journals match my query terms, that's fine. But I also get the warning Select one journal from the provided list or enable multiple = TRUE which, for me as a user, is incomprehensible. Because in my understanding, the actual warning is not that one should select one journal from the list, but that additional columns got dropped during the query due to performance.

When using rr_journal_name("Biostatistics", qtype = "contains", multiple = TRUE) I get confusing messages, one saying that 17 journals match my query terms and the other saying that 2 journals match my query terms. But what is more relevant for me: By setting multiple = TRUE the behaviour of rr_journal_name changes completely. With multiple = FALSE, rr_journal_name() is actually some kind of a rr_journal_find() or rr_journal_search(). But with multiple = TRUE it is actually rr_journal_issn(). So my impression is, that the current family of rrjournal*() functions is not consistent. I encourage the author to redefine the current rr_journal_name() with multiple = FALSE to a more generic rr_journal_find() or the like. This functions could be the entry point and provides ISSNs to rr_journal_issn() to get all the metadata for the journals. (#2)

This was a very useful suggestion and the package now has two separate functions: rr_journal_find() which returns only the title and the ISSN of the journal and rr_journal_name() which always returns all the information on archiving policies of all journals. We thus removed the multiple argument. Rekyt/rromeo@b69df66 to Rekyt/rromeo@80a3aa8

Instead of a warning, there is now a message suggesting the usage of rr_journal_name() instead of rr_journal_find() when the user wants more information.

r_publisher_all() runs into an error: Error in parse_generic(api_answer) : Object 'parsed' not found

There was an error in the GET() call and the function now works. Rekyt/rromeo@59a7b40

Regarding @brunaw comments:

I haven't been able to find a statement of need or an indication fo the target audience of the package.

We now specify better the target audience of the package in the beginning of the README file. Rekyt/rromeo@c86dca8

For the rr_journal_name() function, it would be a good idea to not use the word Warning when printing the multiple = TRUE recommendation. A warning can be scary for new users as it looks like something went wrong.

As explained above the function now gives a message instead of a warning.

I find it a good practice to mention the existence of an API key before the examples, because this could be one of the first errors a user would encounter. Besides that, there aren't many explanations about how to set an APY key. You made recommendations, such as putting it in the .Renviron or .Rprofile, but it might not be obvious for the user how to do so. Some guidelines on this point can be helpful, maybe write a vignette just to describe how to find, set and what are the best practices when using the API key. This will also help you avoid having an extra 'Details' section in most of the documentation files.

Thanks for this very nice suggestion! We now include a full vignette regarding the way to set up an API key Rekyt/rromeo@967322a to Rekyt/rromeo@b45f218 see addition with https://github.com/Rekyt/rromeo/compare/58fb140..b45f218 We furthermore reordered the README to mention the API key at first. Rekyt/rromeo@75fbe2c

The current identation in the rr_journal_name() documentation makes it look like you repeated the first two returns. A sub-item indentation after the second item might help make clear that the next following items are about the multiple = TRUE option.

The function is now split between rr_journal_find() (which returns only title and ISSN) and rr_journal_name() (which returns all the information) thus the documentation is now split between functions.

In the documentation of rr_romeo_colour() you point to the 'Details' section for the API key details, but the section it's not there. I reinforce the idea of having a separate vignette for the API key configuration.

We now systematically point to the API vignette to setup the API key.

I got an error with rr_publisher_all

Thank you for the detailed report of the error, it really helped to fix the bug. It is now fixed Rekyt/rromeo@59a7b40

I found the Contributor Code of Conduct, but that's different from a Contribution Guidelines. While the first specifies how a contributor should behave, the second provides instructions about the steps to contributing to the package. I would advise you to include it in the README or a CONTRIBUTING file, that could also contain the Code of Conduct.

We now include a CONTRIBUTING file. Rekyt/rromeo@78fbe2c

Q: Are there improvements that could be made to the code style? A: Reviewing the way the if-else statements are organized. One idea is using more %in% to avoid extensive | conditions and make sure there are conditions for all the possible cases.

Thank you for this thoughtful comment, it greatly improved the readability of our internal functions Rekyt/rromeo@0990b6f and Rekyt/rromeo@526c502

We hope we managed to answer and take into account all the comments and we would like to thank again both reviewers @brunaw and @ottlngr as well as the editor @sckott for their useful comments and suggestions!

sckott commented 5 years ago

Thanks for your detailed reply @Rekyt including links to commits, etc.

reviewers @brunaw and @ottlngr : are you happy with the responses and changes made? Any further questions/suggestions?

brunaw commented 5 years ago

For me, yes! I reinstalled and tested it, everything is checking. The improvements on the documentation are great as well. Thanks for the package @Rekyt!

sckott commented 5 years ago

thanks @brunaw !

ottlngr commented 5 years ago

Sorry for the late reply, @Rekyt @sckott

Thanks @Rekyt for the detailed comments on the reviews. I appreciate your changes on rromeo and therefore recommend to admit rromeo to rOpenSci :)

sckott commented 5 years ago

Great, thanks for the thumbs up reviewers. I'm taking a final look ...

sckott commented 5 years ago

Approved! Thanks again for your submission @Rekyt ! And thanks for your reviews @ottlngr and @brunaw 👌

To-dos:


We've started putting together a bookdown with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved. The repo is at https://github.com/ropensci/dev_guide

Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, we'll have our community manager @stefaniebutland get in touch with you on that

Rekyt commented 5 years ago

Hi @sckott thank you so much, @Bisaloo and I would be interested in doing a blog post on our lessons learned developing our first API package.

stefaniebutland commented 5 years ago

@Rekyt @Bisaloo 😄 Great to hear you would like to contribute a post about rromeo.

Here are examples of blog posts by authors of peer-reviewed packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/software-peer-review/. Tech Notes (shorter, focusing on a single example) are here: https://ropensci.org/technotes/.

Here are some technical and editorial guidelines for contributing a post: https://github.com/ropensci/roweb2#contributing-a-blog-post. Please submit your draft post via pull request a week before the planned publication date so we can give you some feedback. Right now I have Tues May 21 or Tues June 4 available as publication dates.

Happy to answer any questions.

sckott commented 5 years ago

@Rekyt gave you admin access on the repo now

Rekyt commented 5 years ago

Hi, we would like to thank @brunaw and @ottlngr for checking the package again and recommending it!

@stefaniebutland, we would be interested in writing a regular blog post with the narrative about how we decided to build rromeo and Tues June 4 would be perfect for us! We'll submit a PR with the draft of the post.

Many thanks!

stefaniebutland commented 5 years ago

@Rekyt @Bisaloo Checking in. Is Tues June 4 still a good publication date for you? If so, please submit a draft May 28 or as soon as possible after that so we have time for any refinements

Bisaloo commented 5 years ago

@stefaniebutland, yes, we have been working on a draft on our fork. It should be ready by tonight for a first look on your side! :pray: We will submit a PR as soon as we can.