Closed RaoulWolf closed 2 years ago
Hello @RaoulWolf, just in the process of assigning an editor. In the meantime, just checking, did you mean to submit your fork of the repository?
Hi @annakrystalli, yes I submitted my fork on purpose. I couldn't get an OK to incorporate AppVeyor, CodeCov and Travis CI on the other repository, so I forked it into my own account and added the functionality.
OK, that's fine. I was just wondering whether further development would be done in the upstream fork or yours?
Also, we require test coverage of at least 75% before review. Your codecov badge indicates only 10% test coverage. Is there a reason for that?
Further development would be at the upstream fork, yes. It looks like I might be able to get Circle CI and CodeCov up and running on the upstream fork. Would that be preferable then?
Coverage is an issue indeed, but comes with the nature of the package. I've increased the coverage considerably (>30%), but I'm not sure how to increase it further. The main issue is that I do not want the tests to run the actual API queries because there's a rate limit on the queries. Any recommendations are more than welcome :)
EDIT: the upstream fork (https://github.com/NIVANorge/chemspiderapi) is now updated with Circle CI and CodeCov. Just let me know if you want me to change the details in the OP.
Hey @RaoulWolf.
Thanks for the updates and your efforts with improving test coverage!
Regarding testing the API, have you had a look at package httptest? This should help with what you're trying to achieve.
Hey @annakrystalli,
thanks for the heads up! I'll give httptest
a try asap :)
@RaoulWolf Another option for caching tests (not doing real requests) is the vcr package - but only will work if you use crul or httr instead of curl
Thanks @sckott for the heads up on vcr
.
As you indicated, vcr
(currently?) only works for crul
or httr
; also httptest
does not work for curl
. The reason I chose curl
for the package was the relative flexibility over httr
when assembling headers and data fields. In some instances I was not able to set up working API calls using httr
.
I will take a further look into crul
- of course I would prefer a solution where httptest
or vcr
would offer support for curl
😃
In the meantime, the upstream fork at NIVANorge (https://github.com/NIVANorge/chemspiderapi) has support for AppVeyor, CircleCi, Travis CI, and CodeCov. I assume it's fair game to change the repo address in the OP?
yeah curl
is a great choice. jeroen is considering integration for webmockr (and therefore vcr) https://github.com/jeroen/curl/pull/174 but not sure if it will happen
Unfortunately I did not get crul
to work for all possible functionalities; I have thus decided to stick with curl
.
Otherwise I was able to bump up the test coverage to over 40% without using the actual API query functionalities 👍 I'm still eager to improve coverage, but I'm not sure how to at this moment...? Any help/recommendation is highly appreciated!
I will also update the OP to link to the upstream repository (https://github.com/NIVANorge/chemspiderapi).
There's also https://github.com/nealrichardson/httptest
:wave: @RaoulWolf! any update?
:wave: @RaoulWolf! any update?
@RaoulWolf, you can ask any question reg testing on https://discuss.ropensci.org :-)
⚠️⚠️⚠️⚠️⚠️
In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.
In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.
Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.
The rOpenSci Editorial Board
⚠️⚠️⚠️⚠️⚠️
Thank you @RaoulWolf for your work on chemspiderapi! I am one of the maintainers of webchem. I would like to note that since 2019 September webchem can also access the new ChemSpider web service (https://github.com/ropensci/webchem/issues/149), so I think there might be a package overlap issue (https://devguide.ropensci.org/policies.html#overlap). We would be very happy if this could be resolved, please contact us if you wish to dicuss it in more detail.
⚠️⚠️⚠️⚠️⚠️ In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci new submissions for software peer review are paused.
In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent. Other rOpenSci community activities continue.
Please check back here again after 25 May when we will be announcing plans to slowly start back up.
We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.
The rOpenSci Editorial Board ⚠️⚠️⚠️⚠️⚠️
@RaoulWolf : I'm taking a look through to catch up on any package reviews that have stalled, and I see that we waiting to get an update from you regarding increased code coverage for tests in this package before we proceed. Our policy is to close package review issues one year after the author's last answer, so I wanted to check to see if you had any updates?
It's been more than one year after the author's last answer, so closing.
@maelle Sincerest apologies for the late reply - I haven't received any notifications regarding this submission until now! All answers since November 2019 are new to me... Oh my!
I am currently on vacation but I'm still very keen on pursuing chemspiderapi
further. We have been using it routinely in our work, and last time I checked it provided more functionality than what was (is?) available over at webchem
.
I haven't checked the progress with regard to possible API testing (as I haven't received notifications... again apologies!), but if you'd be willing to re-open this submission I'd be happy to do so.
Oh, in this case yep I'll re-open the issue!
Hi @RaoulWolf, I am a maintainer of webchem
. Please note webchem
is in heavy development, we have had three releases in the past 12 months including v1.0.0. The ChemSpider API has been fixed for about a year now and I think your package has significant overlap with webchem
, only webchem
provides access to other webservices as well. A paper using the fixed ChemSpider API in webchem
has recently been published (https://www.jstatsoft.org/article/view/v093i13). Happy to discuss.
:wave: @RaoulWolf! Seeing that webchem wraps the ChemSpider API again I am a bit wary of overlap. Is there anything webchem does not support?
@maelle thank you for re-opening the issue! Appreciated
@stitam very valid point. I'll get back to this in about a week from now. Thanks for the heads-up! (and for the record, I'm a webchem
fan 😉)
:wave: @RaoulWolf
Lengthy post ahead, so watch out!
I took some time to compare the solutions of webchem
and compare them to what is offered in chemspiderapi
. The first step to adress the potential big overlap is to compare the available functionality, starting with everything ChemSpider offers (https://developer.rsc.org/docs/compounds-v1-trial/1/overview), and comparing the availability between chemspiderapi
and webchem
(I tried to be as thorough as possible, but please forgive me if I oversaw functionalities within webchem
!):
FILTERING
ChemSpider compound API | chemspiderapi wrapper |
webchem wrapper |
Descirption |
---|---|---|---|
filter-element-post | post_element() |
? | Search based on an element |
filter-formula-batch-post | post_formula_batch() |
? | Batch search based on formulas |
filter-formula-batch-queryId-results-get | get_formula_batch_queryId_results() |
? | Results for formula batch search |
filter-formula-batch-queryId-status-get | get_formula_batch_queryId_status() |
? | Status for formula batch search |
filter-formula-post | post_formula() |
cs_formula_csid() |
Search based on formula |
filter-inchi-post | post_inchi() |
cs_inchi_csid() |
Search based on InChI string |
filter-inchikey-post | post_inchikey() |
cs_inchikey_csid() |
Search based on InChIKey |
filter-intrinsicproperty-post | post_intrinsicproperty() |
? | Search based on intrinsic property |
filter-mass-batch-post | post_mass_batch() |
? | Batch search based on masses |
filter-mass-batch-queryId-results-get | get_mass_batch_queryId_results() |
? | Results for mass batch search |
filter-mass-batch-queryId-status-get | get_mass_batch_queryId_status() |
? | Status for mass batch search |
filter-mass-post | post_mass() |
? | Search based on mass |
filter-name-post | post_name() |
cs_name_csid() |
Search based on name |
filter-queryId-results-get | get_queryId_results() |
cs_query_csid() |
Results for standard search |
filter-queryId-results-sdf-get | get_queryId_results_sdf() |
? | SDF results for standard search |
filter-queryId-status-get | get_queryId_status() |
? | Status for standard search |
filter-smiles-post | post_smiles() |
cs_smiles_csid() |
Search based on SMILES string |
LOOKUPS
ChemSpider compound API | chemspiderapi wrapper |
webchem wrapper |
Description |
---|---|---|---|
lookups-datasources-get | get_datasources() |
cs_datasources() |
A list of all available data sources |
RECORDS
ChemSpider compound API | chemspiderapi wrapper |
webchem wrapper |
Description |
---|---|---|---|
records-batch-post | post_batch() |
? | Data for multiple ChemSpider IDs |
records-recordId-details-get | get_recordId_details() |
cs_compinfo() /cs_extcompinfo() |
Data for a ChemSpider ID |
records-recordId-externalreferences-get | get_recordId_externalreferences() |
? | External references for a ChemSpider ID |
records-recordId-image-get | get_recordId_image() |
cs_img() |
PNG image for a ChemSpider ID |
records-recordId-mol-get | get_recordId_mol() |
? | MOL file for a ChemSpider ID |
TOOLS
ChemSpider compound API | chemspiderapi wrapper |
webchem wrapper |
Description |
---|---|---|---|
tools-convert-post | post_convert() |
cs_convert() /cs_convert_multiple() |
Conversion between chemical annotations |
tools-validate-inchikey-post | post_validate_inchikey() |
? | Validation of an InChIKey |
While there (quite unsurprisingly) is overlap, chemspiderapi
offers direct access to all API functionalities of ChemSpider.
Another major difference between the solution offered by chemspiderapi
and webchem
is the workflow. It's really two different philosophies in my view, so there's no right or wrong. chemspiderapi
is very explicit (and maybe lengthy) in every step, but "forces" users to familiarize themselves with the desired API use workflow of ChemSpider; additionally, chemspiderapi
offers several vignettes to guide users in ways how to store their API token(s), rate-limiting quires, memoising queries, and saving .mol, .sdf, or .png files. webchem
has an elegant "under the hood" approach to the ChemSpider's API functionalities it offers.
Maybe less interesting, but also worth noting, is the unsurprising difference in dependencies. chemspiderapi
has two dependencies, and webchem
has 13 dependencies. chemspiderapi
was designed to have as little dependencies as possible, and both dependencies (curl
and jsonlite
) are well-established and introduce no other dependencies.
While for most users the functionalities within webchem
are likely enough, I still think chemspiderapi
provides a more "total" ChemSpider API experience, for anyone wishing to go down this rabbit hole.
Let's discuss!
EDIT
I forgot to mention that chemspiderapi
also has 13 different checking functions to validate inputs (and in some cases outputs), to avoid unnecessary queries against the API. I personally found this extremely useful, e.g., when not wasting my quota on 5'000 queries because I accidentally used the wrong column of a data.frame as input.
EDIT 2 Added additional columns with a very brief description of each functionality.
Thanks a lot for your detailed answer! Could you please add a column or sentences explaining to humans what e.g. records-batch-post does i.e. the type of functionalities your package adds? I have the impression your package allows for posting information, what are the use cases for that? And what about the functionalities it adds that have nothing to do with posting?
Thanks again!
Right, apologies for not providing enough context!
The different post_*()
and get_*()
functionalities are named so from the HTTP methods (POST and GET, respectively).
In simple terms, all post_*()
functions "upload" information into ChemSpider's API services (and usually get a query ID in return), and all get_*()
functionalities "download" information from ChemSpider's API services.
In the example case you mentioned (records-batch-post), a list of up to 100 ChemSpider record IDs is POST-ed to ChemSpider's API services, and a query ID is returned.
I hope this answers your question!
Thanks, and in what kinds of workflows do you need to post information?
@stitam could you please confirm the question marks in the comparison table, i.e. that webchem does not provide that functionality?
The principle workflows for the filtering functionalities all follow the same pattern:
This is also mentioned in the README of chemspiderapi
. The lookups, records and tools functionalities work directly, i.e., without the (manual) three step process.
Hi All! @RaoulWolf thanks for your detailed answer to @maelle's questions.
@RaoulWolf is right in saying that the two packages follow different philosophies and support different types or workflows. My understanding is that chemspiderapi
aims to develop an R function for each ChemSpider API. On the other end, webchem
focuses on the user experience, and aims to distance the user from the API, so instead of being "process" focused it is more "outcome" focused. I agree with @RaoulWolf in that there is no good or bad, these are two different approaches.
It follows from the difference in philosophies that webchem
doesn't implement all APIs. The beginning of a webchem
workflow is that a user is looking for a "specific" compound or set of compounds. Therefore we skipped a few APIs like filter-mass-post
where the user would specify a molecular weight range and the API (actually 2-3 APIs called after eachother) would return a list of CSIDs for compounds with molecular weights falling in that range. At this point we didn't find a good enough use case for that functionality and our user haven't asked for it either.
We also didn't implement functions that process batch requests. We do see the benefit of batch requests, but not all requests have a batch alternative at the moment, so it was difficult to bind them all into a simple and easy to use user facing function. We did implement the non-batch alternatives for these queries however. ChemSpider APIs seem to be under development so once batch is available for all the queries we need (it's actualy filter-name-batch-post
we are missing) then we'll add the batch options as well to make our queries more efficient.
Finally a few notes on the table above, in terms of user experience, in webchem
all ChemSpider APIs that ultimately return CSIDs are actually pooled into get_csid()
, that is the exported function. Also get_queryId_status
is called within the functions to query the status before requesting the response itself. Other than these, yes @maelle, I can confirm the question marks, those APIs are currently not implemented in webchem
so there is no overlap there at the moment.
Editor comments:
Hi @RaoulWolf
Thank you for all of the comments and answers to our questions that you have provided. We will move ahead with the onboarding process for this package. I will be the editor handling the process going forward.
We have been happy to see the discussion with the maintainers of {webchem} and think that it would be useful to mention webchem in your documentation. It is also a hope that these packages could work well together in the future (i.e. perhaps chemspiderapi could become a dependency for webchem or some other scenario).
Here are the results of some of the issues flagged with some of our initial editor checks:
── GP chemspiderapi
It is good practice to
✖ write unit tests for all functions, and all package code in general. 41% of code lines are covered by test cases.
R/FILTERING-get_formula_batch_queryId_results.R:29:NA
R/FILTERING-get_formula_batch_queryId_results.R:31:NA
R/FILTERING-get_formula_batch_queryId_results.R:33:NA
R/FILTERING-get_formula_batch_queryId_results.R:35:NA
R/FILTERING-get_formula_batch_queryId_results.R:37:NA
... and 414 more lines
[x] increase code coverage of unit tests
✖ add a "BugReports" field to DESCRIPTION, and point it to a bug tracker. Many online code hosting services provide
bug trackers for free, https://github.com, https://gitlab.com, etc.
- [x] add a "BugReports" field to description
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters
R/CHECKING-check_apikey.R:8:1
R/CHECKING-check_apikey.R:12:1
R/CHECKING-check_apikey.R:16:1
R/CHECKING-check_complexity.R:8:1
R/CHECKING-check_elements.R:12:1
... and 683 more lines
- [ ] check code line length (where possible)
I will look for package reviewers once these issues have been addressed. Please let me know if you have any questions or if I can clarify anything.
Thanks! Julia
Hi @jooolia, thanks for the update! I'll try to fix the easier issues (bug reports and code line length) this week, and tackle code coverage next week. I haven't checked presser
yet, thanks for the heads up! I'll let you know once the issues are addressed.
Thanks for the effort! Raoul
The BugReports
field was added to the description, the README
now mentions {webchem}
, and I tried to minimize the line lengths as much as possible.
All changes are live at https://github.com/NIVANorge/chemspiderapi
Code coverage hasn't been addressed yet.
Thanks for keeping us posted on your updates @RaoulWolf!
A small update regarding test coverage.
I tried implementing API tests with both {httptest} and {vcr}, but with no luck. {vcr} does not seem to support {curl} just yet (the introduction hints at future support), and {httptest} simply runs the actual queries without mocking.
I am now taking a look at {presser} and will let you know how far I get.
Thanks for the update @RaoulWolf. If you get really stuck let us know.
{presser} supports curl indeed and that's the only HTTP testing package that does. :-)
Small update: I finally managed to wrap my head around {presser} and the first tests are running (see here, line 77). I'll now try to extend the testing functionality to the other functions. Baby steps 😃
Meanwhile, I've also added GitHub Actions as CI (trying to substitute CircleCI and Travis CI). Unfortunately the {curl} requirement keeps breaking the installation on Linux machines. My initial approach was adding the following chunk to R-CMD-check.yaml
:
- name: Install libcurl
if: runner.os == 'Linux'
run: |
sudo apt-get install -y libcurl4-openssl-dev
The test then fails when (trying to) installing {chemspiderapi}:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/home/runner/work/_temp/Library/curl/libs/curl.so':
/usr/lib/x86_64-linux-gnu/libcurl.so.4: version `CURL_OPENSSL_3' not found (required by /home/runner/work/_temp/Library/curl/libs/curl.so)
I also tried with libcurl4-gnutls-dev
, but no luck either. Any tips warmly welcome!
Untested tip, I'd look at what other curl dependencies have in their workflows e.g. https://github.com/r-lib/httr/blob/cb4e20c9e0b38c0c020a8756db8db7a882288eaf/.github/workflows/R-CMD-check.yaml#L60
Turns out the "trick" is to not use Ubuntu 20.xx at all, but good ol' 16.04. Now all tests pass 👍
Thanks for keeping us updated @RaoulWolf. Have you run into any other roadblocks or is it looking like {presser} will help you with the testing?
Hi @jooolia, It first looked like {presser} would help and I can definitely setup mock-APIs with it (hurray!). But I still struggle to increase coverage. I suppose it's because the API request itself happens within {chemspiderapi} functions, but I'm not sure. An example can be seen here, line 77. The tests pass, but the coverage is not increased, which is slightly frustrating. On the positive side I have now over 500 tests for the package that pass 😅
For the CI I was wondering if GitHub actions would be enough? I guess there's no need for Travis, AppVeyor or CircleCI?
Commenting on this again, hope it's fine. The presser package now has a function with which you don't need to do setup/teardown
Hi @RaoulWolf, Yes the CI using GitHub actions should be sufficient as you have it implemented.
I am also trying to wrap my head around webfakes (previously presser now at https://r-lib.github.io/webfakes/ ) so I cannot help a lot with that aspect (but I will try as I am curious about how to do this sort of testing), however one thing that I see when I run covr::report()
is that there are many arguments in the functions that are not tested in the tests so these lines are never run and thus not covered. And this is also true with the API testing, since none of the functions from the package are called in the mock API the coverage does not increase.
I will try to look a bit more at webfakes over the next few days and see if I can help.
@maelle your inputs and wisdom are always welcome. :)
(Just putting this here for reference, there is a nice book by @maelle about http testing: https://books.ropensci.org/http-testing/packages-for-http-testing.html)
Thanks @jooolia :blue_heart: Demo of webfakes at https://github.com/ropensci-books/http-testing/pull/47 with demos of other packages.
So it seems the low coverage is not a webfakes problem, correct? I'm happy to help if needed.
Wonderful @maelle! I think the demos will be very helpful! Thanks for pointing us to this material.
Hi @jooolia and @maelle, very cool to see {presser} become {webfakes}! I'll keep an eye out on the book 😃
I'll try a few more variants to increase coverage with {webfakes} over the weekend; maybe there's a way I haven't thought about...
I tried now several other ways to code tests so it can run with "mock" APIs using {webfakes}, but I simply cannot find a way to emulate a (hardcoded URL) API from outside the function which defines the API itself.
All other tests work well, but this seems very complicated to put in place at this point. I looked into {webchem}'s approach to this, and it seems they - apart from the very reasonable skip_on_cran()
and skip_on_ci()
controls - actually run real queries.
Given my current subscription with ChemSpider I'd very much like to avoid running alot of queries every time the package is updated/revised.
How do we proceed now? I'm very much open to give {webfakes} another try with some help, but otherwise I seem to be stuck in a dead-end when it comes to increasing test coverage.
Submitting Author Name: Raoul Wolf Submitting Author Github Handle: !--author1-->@RaoulWolf<!--end-author1-- Repository: https://github.com/RaoulWolf/chemspiderapi Version submitted: 0.0.2 2021-03-15 Editor: !--editor-->@jooolia<!--end-editor-- Reviewers: @rajarshi, @yufree, @data-datum
Due date for @rajarshi: 2021-03-15 Due date for @yufree: 2021-03-15 Due date for @data-datum: 2021-03-15Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The
chemspiderapi
package is an easy-to-use R interface to use all new ChemSpider API functionalities, as introduced in ChemSpiders complete redesign of its API structure late 2018.Researchers and citizen scientists who work on anything chemistry-related and need to routinely query against any of ChemSpider's API services.
The rOpenSci-maintained package webchem aims at offering (currently outdated / not working) functionality for accessing ChemSpider's API services. However, the redesign of ChemSpider's API structures in late 2018 broke all available functionalities.
Pre-submission enquiry: https://github.com/ropensci/software-review/issues/294 @melvidoni
Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
- [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct