ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

chemspiderapi #329

Closed RaoulWolf closed 2 years ago

RaoulWolf commented 5 years ago

Submitting Author Name: Raoul Wolf Submitting Author Github Handle: !--author1-->@RaoulWolf<!--end-author1-- Repository: https://github.com/RaoulWolf/chemspiderapi Version submitted: 0.0.2 2021-03-15 Editor: !--editor-->@jooolia<!--end-editor-- Reviewers: @rajarshi, @yufree, @data-datum

Due date for @rajarshi: 2021-03-15 Due date for @yufree: 2021-03-15 Due date for @data-datum: 2021-03-15

Archive: TBD
Version accepted: TBD


Package: chemspiderapi
Type: Package
Title: R Wrapper for ChemSpider's API Services
Version: 0.0.2
Authors@R: person("Raoul", "Wolf", email = "raoul.wolf@niva.no", role = c("aut", "cre"))
Description: ChemSpider has announced a fundamental change to the syntax of their API services in late 2018.
    This package provides convenience wrappers for the new API functionalities of ChemSpider, as well as complementary functions.
License: MIT + file LICENSE
Depends: R (>= 3.5.0)
Imports: 
    curl, 
    jsonlite
Suggests: 
    covr,
    keyring,
    knitr,
    magick,
    memoise,
    ratelimitr,
    rmarkdown,
    testthat
URL: https://github.com/RaoulWolf/chemspiderapi
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.1.1
VignetteBuilder: knitr

Scope

The chemspiderapi package is an easy-to-use R interface to use all new ChemSpider API functionalities, as introduced in ChemSpiders complete redesign of its API structure late 2018.

Researchers and citizen scientists who work on anything chemistry-related and need to routinely query against any of ChemSpider's API services.

The rOpenSci-maintained package webchem aims at offering (currently outdated / not working) functionality for accessing ChemSpider's API services. However, the redesign of ChemSpider's API structures in late 2018 broke all available functionalities.

Pre-submission enquiry: https://github.com/ropensci/software-review/issues/294 @melvidoni

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options - [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

maelle commented 3 years ago

Where is that URL hard-coded? Could you use an internal function to define the API URL instead as in https://books.ropensci.org/http-testing/webfakes.html#actual-testing-2?

Reg real requests we've added this chapter to the book https://books.ropensci.org/http-testing/real-requests-chapter.html

RaoulWolf commented 3 years ago

The respective URLs are hard-coded in the respective functions, for example here:

url <- "https://api.rsc.org/compounds/v1/filter/inchikey" 

This URL (and all others) are then used in curl::curl_fetch_memory(). Most of the API functionalities come with dedicated URLs, so this way of hard-coding allows to easier keep track of the URLs, in case there are any future changes.

Additionally, it is sometimes necessary to modify the hard-coded URL depending on user-defined input, such as here, where the minimum requirement is to provide a queryId, and start and count are optional:

if (is.null(start) && is.null(count)) {
    url <- paste0("https://api.rsc.org/compounds/v1/filter/", queryId, "/results")
  }

  if (!is.null(start) && is.null(count)) {
    url <- paste0("https://api.rsc.org/compounds/v1/filter/", queryId, "/results?start=", start)
  }

  if (is.null(start) && !is.null(count)) {
    url <- paste0("https://api.rsc.org/compounds/v1/filter/", queryId, "/results?count=", count)
  }

  if (!is.null(start) && !is.null(count)) {
    url <- paste0("https://api.rsc.org/compounds/v1/filter/", queryId, "/results?start=", start, "&count=", count)
  }

I'll take a look into the "Making real requests" chapter of the book asap! This makes me wonder if there's a sensible way to run "real" API tests without exceeding rate-limitations; possibly a brute-force use of Sys.sleep() could help? Then again, if the tests are excluded from running on CI systems (skip_on_ci()), the testing coverage will not increase either...

maelle commented 3 years ago

Instead of having

url <- "https://api.rsc.org/compounds/v1/filter/inchikey"

you could have

url <- get_compounds_inchikey_url()

where get_compounds_inchikey_url() is

get_compounds_inchikey_url <- function() {
  api_url <- Sys.getenv("COMPOUNDS_INCHIKEY_URL")

  if (nzchar(api_url)) {
    return(api_url)
  }

  "https://api.rsc.org/compounds/v1/filter/inchikey"
}

This way you can override the URL by setting the environment variable to the URL of a webfakes app process.

maelle commented 3 years ago

Same for your other snippet, you'd replace "https://api.rsc.org/compounds/v1/filter/" with get_compounds_filter_url(). Does it make sense?

maelle commented 3 years ago

This makes me wonder if there's a sensible way to run "real" API tests without exceeding rate-limitations; possibly a brute-force use of Sys.sleep() could help? Then again, if the tests are excluded from running on CI systems (skip_on_ci()), the testing coverage will not increase either...

You'd make sure not to run them too often. Another thing that might help, or not, is asking the web API providers if you can get some sort of special developer account (I mention this in the book, for some APIs it can help as package maintainers are helping spread their product/API, after all).

True about test coverage, hopefully by making it possible to override the API URLs you can use more webfakes apps.

maelle commented 3 years ago

Also, if you don't mind my asking: why do you use curl by the way? (rather than httr or crul)

RaoulWolf commented 3 years ago

Lot's to unpack here, thank you for the detailed recommendations @maelle! Let's start from the end:

Also, if you don't mind my asking: why do you use curl by the way? (rather than httr or crul)

when I first started working on the package, {curl} was the only package that would allow to run all different flavors of API queries that were necessary. For the record, in the beginning I tried implementing the functionalities with {httr} and {crul}, but not all API queries were possible. Apart from that {curl} comes with no dependencies, which I personally prefer (that's also the reason why I use {jsonlite} in the package).

Another thing that might help, or not, is asking the web API providers if you can get some sort of special developer account

...let's just say we should explore other options first ;)

This way you can override the URL by setting the environment variable to the URL of a webfakes app process.

I like the idea of using environmental variables, {webchem} uses it as well if I'm not mistaken. My worry is that the readability (and, after all, maintenance) of the code becomes more difficult with these constructs. To me this seems like a trade-off situation between code simpleness/tidiness on one hand and increasing test coverage on the other hand. But I think these thoughts are a bit bigger than my humble package :) I'll try to see if this approach works!

maelle commented 3 years ago

A more compact version would be

baseurl <- function() {
  Sys.getenv("COMPOUNDS_INCHIKEY_URL", "https://api.rsc.org/compounds/v1/filter/inchikey")
}

Reg https://github.com/NIVANorge/chemspiderapi/blob/master/R/FILTERING-get_queryId_results.R?rgh-link-date=2020-12-08T14%3A32%3A34Z you might want to create a helper function that returns the URL depending on the variables (to have less if/else)?

RaoulWolf commented 3 years ago

I now implemented {webfakes} for most API calls and the test coverage is around 97%. CI is now fully switched to Github Actions. Everything is live at the repository: https://github.com/NIVANorge/chemspiderapi

Anything else needed?

jooolia commented 3 years ago

Great @RaoulWolf! I will have a closer look tomorrow evening. Thanks for the perseverance.

jooolia commented 3 years ago

Thanks for all the additions and updates @RaoulWolf. Indeed the test coverage now looks good. I will move forward with looking for reviewers. Thanks!

jooolia commented 3 years ago

Reviewer: @rajarshi Reviewer: @yufree Reviewer: @data-datum Due date: 2021-03-15

Thanks for agreeing to review. The due date for your reviews is March 15th. Thanks! Julia

yufree commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 7 hours


Review Comments

Overall it's a useful package for researchers to retrieve information from ChemSpider API. I tested this package as a chemical researcher and software developer. As a researcher, I might need the example of every function could return results without errors. As a software developer, I would check if the package have the proper style and readable code for other R users. At current stage, this package still need some improvements.

Information about Chemspider API

It would be helpful to mention that free account will have up to 1000 API requests per month in the readme.

goodpractice and spelling check

For goodpractice check, lots of code lines were longer than 80 characters. I checked those lines and found some long code lines might need to format in a shorter way for the reader. Developer can consider Reformat Code function in RStudio to improve the codes. Another issues were about the test coverage, which looks fine for me.

For spelling check, most of the results were terms in Cheminformatics, which should be fine. However, in Rate-Limiting_API_Queries.Rmd:21, 'recomendations' should be 'recommendations'.

inst folder

I found the files in the inst folder and I wonder if the developer could mention this file somewhere for the user as a short introduction of this package. Or the developer could consider to move the contents to <package-name>.R and use roxygen2 package to generate the Rd file for the whole package.

Function example

I think something might changed during the past year and some examples of function will not work. Because most of examples were protected by do-not-run with API token, regular test will not find those issues. I hope the developer could test those examples in the revision.

vignette: Rate-Limiting API Queries

Line 91. "All queries need to report the status Complete before proceeding." and unlist(y)[1] only return one element as 1. The third element is the 'Complete' and that will give the chemspider ID. I wonder if this issue is from the change of API? Meanwhile, the returned object should be a vector of chemspider ID. This line could be changed to

demo_chemicals$id = mapply(FUN = function(x, y) sleepy_get_queryId_results(queryId = x, status = unlist(y)[3], apikey = apikey, coerce = T)$results, x = demo_chemicals$queryId, y = demo_chemicals$status)

Line 151. Similar issue as line 91

Line 189. Here the code should be

mutate(queryId = map_chr(InChIKey, ~ slowly_post_inchikey(inchikey = .x, apikey = apikey)$queryId))

Otherwise user will get the error "Error: Problem with mutate() input queryId. x Can't coerce element 1 from a list to a character ℹ Input queryId is map_chr(InChIKey, ~slowly_post_inchikey(inchikey = .x, apikey = apikey))."

Line 200. The code should be

mutate(status = map(queryId, ~ unlist(slowly_get_queryId_status(queryId = .x, apikey = apikey)[3]))) %>% 

Line 212. The code should be

mutate(id = map2_int(queryId, status, ~ slowly_get_queryId_results(queryId = .x, status = .y, apikey = apikey)$results))

vignette: Remembering API Queries

This is an empty vignette. Maybe remove this till it's finished.

vignette: Saving MOL and SDF Files

Line 32-34. Maybe remove this chunk.

Line 44. Should be writeLines(caffeine_mol$sdf, con = con)

Line 54. Should be queryId <- post_mass(mass = 194, range = 0.002, apikey = apikey)$queryId

Line 66. Should be caffeine_sdf <- get_queryId_results_sdf(queryId = queryId, status = status$status, apikey = apikey)

vignette: Saving PNG Images

Actually png package could do this work. Maybe we don't need magick package unless we need extra features.

caffeine_image_raw <- get_recordId_image(recordId = 2424L, apikey = apikey,decode = T)
png::writePNG(png::readPNG(caffeine_image_raw$image), "mypng.png")

In general, it's nice to see a package for ChemSpider API. As discussed above, webchem might have overlap functions while chemspiderapi give access to all of the APIs provided by ChemSpider. I hope those mentioned issues could be fixed.

Hope you will find the comments useful.

Thanks for the review invitation.

Miao

RaoulWolf commented 3 years ago

Thanks for the detailed review and comments @yufree!

I'll work through your comments and critiques during the next two weeks (busy times...) and will post the responses here. Once again thank you for the review!

jooolia commented 3 years ago

Thanks for the review @yufree. Just a friendly reminder @rajarshi and @data-datum that the review due date is today. Thanks! Julia

rajarshi commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 2


Review Comments

The package provide a consistent way to access Chemspider API endpoints. As already noted, there is overlap with webchem, and given the more expansive nature of this package, it may be worth considering merging this into webchem so that users can go to one place to access ChemSpider from R

It would be useful to note in the README a link to the RSC dev pages where a user can get an API key

I'm not a fan of the combination of camelCase and underscore's in function names. For example get_recordId_mol could be get_record_id_mol or getRecordIdMol (with the former being preferred).

The Remembering API Queries vignette seems to be empty

The rate limiting vignette is a great example of responsibly interacting with web APIs

Having to specify the api key with every call seems inefficient. One way to avoid this would be to have an env that is not exported, so that a user could set the key at the beginning of a session, which would add the key to the env and then functions would internally get the value of the key from the env. It would be good practice to have the apikey argument still present but with a default of NULL, so that if the user wanted to temporarily use a different key, they could.

Overall a well written package, that provides a comprehensive interface to the ChemSpider API

RaoulWolf commented 3 years ago

just a heads-up that the revisions are a bit delayed until after Easter

thank you again @yufree and @rajarshi for the reviews, greatly appreciated!

data-datum commented 3 years ago

Package review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software Authors: A list of authors with their affiliations A statement of need clearly stating problems the software is designed to solve and its target audience. References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 4

Review Comments

jooolia commented 3 years ago

Thanks @yufree, @rajarshi and @data-datum for the reviews and thank @RaoulWolf for the update on when we will expect revisions.

jooolia commented 3 years ago

Hi @RaoulWolf, Have you had time to work on any of the comments from the reviewers? Thanks, Julia

RaoulWolf commented 3 years ago

Hi @jooolia, still working on it! I had to postpone the revision process quite dramatically but hopefully by the end of June everything should be in order 👍

RaoulWolf commented 3 years ago

Hi again @jooolia , and sorry for the delay. I've started working through the comments of the reviewers and have addressed most of the issues already - I'll update this thread once the revision is complete.

Meanwhile, the main repository for the package will be RaoulWolf/chemspiderapi after all (and not the previous repository); from August on I will not have access to the previous repository anymore and it will likely become private. Should the "new" repository (RaoulWolf/chemspiderapi) be a fork of the old one (like it is now) or rather a fresh repository?

jooolia commented 3 years ago

Hi @RaoulWolf , Thanks for keeping us updated on your progress.

I do not see any reason that the fork repository would be problematic. I will update the information at the top of this issue with this new repo. After the review process is completed we will transfer it to the rOpenSci organization. @maelle , do you think there would be any problems if we need to transfer a repository that is a fork (instead of the original)?

jooolia commented 3 years ago

Hi @RaoulWolf, Do you have any updates on your revisions? If you are not able to work on it right now we could put the "holding" label here. https://devguide.ropensci.org/policies.html?q=hold#policiesreviewprocess

Thanks! Julia

jooolia commented 2 years ago

Hi @RaoulWolf, do you have any updates on your progress? You can request to add a "holding" label to the issue or if I do not get a reply by November 5th 2021 I will close this issue. Thanks, Julia

jooolia commented 2 years ago

Email sent to raoul.wolf@hotmail.de and raoul.wolf@niva.no asking for an update. Otherwise I will close this issue if we do not receive a response in a week( by 8 November 2021).

jooolia commented 2 years ago

Response to email received today:

Hi Julia,

thanks for reaching out! I've been quite busy the past few months (new job...) but finished most of the requested changes already, except a revised vignette. I hope to be able to dedicate some more time to it before Christmas, so a "holding" label would be fantastic.

Once again, thank you for reaching out and have a great day!

Thus I will add the holding label until 2022 and then we can revisit.

jooolia commented 2 years ago

Hi @RaoulWolf, Any updates here? Thanks, Julia

jooolia commented 2 years ago

Email sent to raoul.wolf@hotmail.de

Hi Raoul, I am reaching out again regarding chemspiderapi. Do you have any updates? I think I will close the issue and if you have time to complete it you could reopen the issue and ping me on it.

Thanks, Julia

With that I will close this issue. Cheers, Julia