ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

submission: awardFindR #432

Closed mccallc closed 3 years ago

mccallc commented 3 years ago

Submitting Author: Michael McCall (@mccallc) Other Authors: Sebastian Karcher (@adam3smith) Repository: https://github.com/PESData/awardFindR Version submitted: 0.1.0 Editor: !--editor-->@annakrystalli<!--end-editor-- Reviewers: @karawoo, @zambujo

Due date for @karawoo: 2021-06-01 Due date for @zambujo: 2021-06-01

Archive: TBD Version accepted: TBD


Package: awardFindR
Type: Package
Title: QDR awardFindR
Version: 0.1.0
Authors@R: c(person("Michael", "McCall", email = "mimccall@syr.edu",
                  role = c("aut", "cre"), comment = c(ORCID="0000-0002-4668-4212")),
                  person("Sebastian", "Karcher", email = "karcher@u.northwestern.edu",
                  role = c("ctb"), comment = c(ORCID = "0000-0001-8249-7388")),
                  person("Qualitative Data", "Repository", email = "qdr@syr.edu",
                  role = c("cph")),
                  person("Sloan", "Foundation", email = "technology@sloan.org",
                  role = c("fnd")))
Author: Michael C. McCall
Maintainer: Michael McCall <mimccall@syr.edu>
Description: Queries a number of scientific awards databases.
    Collects relevant results based on keyword and date parameters,
    returns list of projects that fit those criteria as a data frame.
    Sources include: Arnold Ventures, Carnegie Corp, Federal RePORTER,
    Gates Foundation, MacArthur Foundation, Mellon Foundation, NEH, NIH,
    NSF, Open Philanthropy, Open Society Foundations, Rockefeller Foundation,
    Russell Sage Foundation, Robert Wood Johnson Foundation, 
    Sloan Foundation, Social Science Research Council, John Templeton Foundation,
    USASpending.gov
URL: https://github.com/PESData/awardFindR
BugReports: https://github.com/PESData/awardFindR/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
    rvest,
    xml2,
    readr,
    httr
RoxygenNote: 7.1.1
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    vcr,
    covr
VignetteBuilder: knitr
Depends: 
    R (>= 2.10)
Config/testthat/edition: 3

Scope

Acts as an interface to web-based APIs and search engines.

Anyone interested in trends in academic funding or researchers looking for a suitable funder based on previously funded research. The package can be used for scientometrics more generally, or more narrowly field-specific lines of inquiry.

The target audience is primarily US-based researchers, and the sources of governmental research funding are all covered to reasonable expectations through the combination of NSF, NEH, NIH, Federal RePORTER and USAspending. Private funders are more difficult to evaluate, but the sources included are the most prominent in the social sciences.

Nothing that is functionally similar. Some source-specific packages exist, but none that aggregate across sources.

nihexporter provides search functionality for the NIH grant database, but is limited to an existing scrape of NIH data from 2016. awardsBot dynamically scrapes NSF awards, but lacks search functionality and is primarily designed for tracking and emailing PIs regarding their obligations. fedreporter is not actively maintained and only offers search functionality for a antiquated and limited version of the Federal Reporter API we implement here.

This package offers a unified, simple search functionality across these federal grant databases, and can include many private funders as well that are not covered by any existing packages.

Yes

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

melvidoni commented 3 years ago

Hello @mccallc, thanks for submitting. We were discussing this submission with the Associate Editors, but we need more information.

In particular, we want to know "how much this package solves the problem". Say, does it cover all possible sources? The top-10 sources? The most prestigious sources? Also, it would be nice to know not just that other packages cover some sources, but more precisely what proportion of the covered sources are already provided by other packages. If all sources are already considered, maybe the benefit of this package is that it might provide a consistent interface across sources?

Please, edit your answer on the submission post, and let me know once it is done.

mccallc commented 3 years ago

Hello @melvidoni, thank you so much for your consideration. I believe I have now addressed your questions in the original post.

melvidoni commented 3 years ago

Thanks @mccallc. @annakrystalli will be your Handling Editor. She will run the initial checks soon.

annakrystalli commented 3 years ago

@ropensci-review-bot assign @annakrystalli as editor

ropensci-review-bot commented 3 years ago

Assigned! @annakrystalli is now the editor

annakrystalli commented 3 years ago

Editor checks:


Editor comments

Dear @mccallc

Thanks for submitting this super interesting package! I've now completed initial editor checks and there are a few issues to sort out before we proceed.

goodpractice

There are a number of issues returned from running goodpractice::gp()

It is good practice to

Tests

A bit more detail on the single failing test when running devtools::test() (also turning up in devtools::check())

Error (test-full.R:2:3): Search that should return all empty
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.nsf.gov/services/v1/awards.json?keyword="foobar"&cfdaNumber=47.041,47.050,47.049,47.078,47.083,47.079,47.070,47.074,47.076,47.075&printFields=id,date,startDate,expDate,title,awardeeName,piFirstName,piLastName,piEmail,cfdaNumber,fundsObligatedAmt,fundProgramName&dateStart=01/01/2019&dateEnd=03/22/2021&offset=1
vcr is currently using the following cassette:
  - ../fixtures/empty.yml
    - record_mode: once
    - match_requests_on: method, uri
Run `vcr::vcr_last_error()` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================

Backtrace:
  1. base::suppressMessages(...) test-full.R:2:2
  5. awardFindR::awardFindR("foobar") test-full.R:3:4
  9. awardFindR::.nsf_standardize(keywords, from_date, to_date)
 10. base::lapply(keywords, nsf_get, from_date, to_date) /Users/Anna/Documents/workflows/rOpenSci/editorials/awardFindR/R/nsf.R:49:2
 11. awardFindR:::FUN(X[[i]], ...)
 12. awardFindR::request(paste0(query_url, "&offset=", offset), "get") /Users/Anna/Documents/workflows/rOpenSci/editorials/awardFindR/R/nsf.R:24:2
 13. httr::GET(url) /Users/Anna/Documents/workflows/rOpenSci/editorials/awardFindR/R/utils.R:30:4
 14. httr:::request_perform(req, hu$handle$handle)
 15. httr:::perform_callback("request", req = req)
 16. webmockr:::callback(...)
 17. webmockr::HttrAdapter$new()$handle_request(req)
 18. private$request_handler(req)$handle()
 19. eval(parse(text = req_type_fun))(self$request)
 20. err$run()
 21. self$construct_message()
──────────────────────────────────────────────────────────────────────────
✓ |   1       | http [0.3 s]                                              
✓ |   1       | neh [1.7 s]                                               
✓ |   1       | ophil [0.2 s]                                             
✓ |   1       | osociety [0.4 s]                                          
✓ |   2       | rockefeller [0.4 s]                                       
✓ |   1       | ssrc [0.9 s]                                              
✓ |   1       | templeton [3.7 s]                                         
✓ |   1       | usaspend [0.2 s]                                          

══ Results ═══════════════════════════════════════════════════════════════
Duration: 89.6 s

Test failure means I can't run covr::package_coverage() to get a breakdown of test coverage.

Spellcheck

Only a single typo thrown up by devtools::spell_check()

Documentation question

In your docs, you only demonstrate the use of the awardFindR() function, yet in the reference section, the list of exported functions is much longer, including finctions like request() that seems to be more of a back-end function. It's a good time to review exported functions to make sure the reviewers can focus on functions that are only relevant to the end user. Also, for the functions you do decide to export, while not necessary to introduce every single function in the docs (that's what examples and the reference section are for after all!), it would be good to mention to the user in the docs that other exported functions exist with an example of how they could typically use one of them.


@mccallc, could you also please add the rOpenSci under review badge to your README?

[![](https://badges.ropensci.org/432_status.svg)](https://github.com/ropensci/software-review/issues/432)
mccallc commented 3 years ago

Hello @annakrystalli,

I believe I have addressed each of your points regarding good practice, tests, spellcheck, documentation, and adding the rOpenSci badge. These are all reflected in the latest commit to the repository.

goodpractice::gp() still returns a few lines missing tests, though these are mainly for fringe cases of error handling that can be reasonably expected to work. Over 99% of code is covered. If a full 100% is needed, please let me know.

mccallc commented 3 years ago

Hello @annakrystalli,

Hope I'm not bothering, but I'm not sure if I messed up any automatic processes when I accidentally closed this ticket prematurely. I believe I addressed your previous comments, and I wanted to indicate I'm still interested in continuing the review process.

annakrystalli commented 3 years ago

Hello @mccallc ! Thanks for confirming and for addressing the initial comments. 99% coverage is absolutely fine.

I'll start looking for reviewers.

annakrystalli commented 3 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 3 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/432_status.svg)](https://github.com/ropensci/software-review/issues/432)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

annakrystalli commented 3 years ago

@ropensci-review-bot add @karawoo to reviewers

ropensci-review-bot commented 3 years ago

@karawoo added to the reviewers list. Review due date is 2021-06-01. Thanks @karawoo for accepting to review! Please refer to our reviewer guide.

annakrystalli commented 3 years ago

@ropensci-review-bot add @zambujo to reviewers

ropensci-review-bot commented 3 years ago

@zambujo added to the reviewers list. Review due date is 2021-06-01. Thanks @zambujo for accepting to review! Please refer to our reviewer guide.

annakrystalli commented 3 years ago

Dear @karawoo & @zambujo, just a reminder that the deadline for review is approaching. If you have any questions, please feel free to reach out!

karawoo commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 3


Review Comments

This is a really nice and well put together package. It was very easy to get started with it and the functions are quite intuitive. I thought the contributing guide was also particularly well written.

While testing the examples I found that one example threw an error, and a few others didn't error but returned different output than I expected:

library("awardFindR")

## Throws an error
specific <- awardFindR(keywords=c("ethnography", "case studies"))
#> ... [removed long curl output here] ...
#> Error in data.frame(grantee = info[2], date, amount, program = info[4],  : 
#>   arguments imply differing number of rows: 1, 0

## Returns NULL
(arnold <- arnold_get("qualitative data", 2016, 2017))
#> POST https://pyj9b8sltv-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(4.5.1)%3B%20Browser%20(lite)%3B%20instantsearch.js%20(4.8.3)%3B%20JS%20Helper%20(3.2.2)&x-algolia-api-key=bec9ead5977b11ae383f4df272c2f106&x-algolia-application-id=PYJ9B8SLTV ... OK (HTTP 200).
#> NULL

## Succeeds with a warning
sloan <- sloan_get(c("qualitative data", "case studies"), 2018, 2020)
#> GET https://sloan.org/grants-database?dynamic=1&order_by=approved_at&order_by_direction=desc&limit=3000 ... OK (HTTP 200).
#> Warning in grepl(keyword, descriptions, ignore.case = TRUE): argument 'pattern'
#> has length > 1 and only the first element will be used

## Returns a list instead of a data frame
nsf <- nsf_get("ethnography", "2020-01-01", "2020-02-01")
#> GET https://api.nsf.gov/services/v1/awards.json?keyword="ethnography"&printFields=id,date,startDate,expDate,title,awardeeName,piFirstName,piLastName,piEmail,cfdaNumber,fundsObligatedAmt,fundProgramName,abstractText,awardeeCounty&dateStart=01/01/2020&dateEnd=02/01/2020&offset=1 ... OK (HTTP 200).
class(nsf)
#> [1] "list"

Created on 2021-05-28 by the reprex package (v0.3.0.9001)

I noticed that the awardFindR() sets the stringsAsFactors option to FALSE, and this will persist to the rest of the user's R session. While FALSE is now the default from R 4.0.0, I think it would be more polite to set the option within the function and restore the user's original settings when the function exits:

opt <- options(stringsAsFactors = FALSE)
on.exit(options(opt))

In general the functions in this package are quite consistent, however some functions (neh_get() for example) provide a message when no results are found, whereas others do not. Personally I find the messages helpful (moreso than the HTTP requests that are printed).

Other minor comments

A few thoughts/ideas that I'll leave up to the package author whether they want to incorporate:

zambujo commented 3 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 3


Review Comments

{awardFindR} is a well-organised, well-tested package providing a common interface to access scattered data on grants and awards over a series of APIs and websites.

I could not find any issues other than those pointed out in the previous reviews. I will nevertheless add a few minor points, mostly about design and styling, whose implementation is not required and left to the consideration of the authors.

annakrystalli commented 3 years ago

Thank you @karawoo & @zambujo for your thoughtful and timely reviews!

Over to you @mccallc 😊. Any questions, just let me know.

mccallc commented 3 years ago

Hello!

Thank you very much @karawoo and @zambujo for your reviews and extensive comments. I really appreciate them. I've implemented changes that should address each individual comment, including:

Based on review from @karawoo:

Based on review from @zambujo:

Again, thank you so all much. These comments really helped push this package in the right direction.

Let me know what's next in the process, @annakrystalli !

annakrystalli commented 3 years ago

Thanks for the response @mccallc !

@karawoo & @zambujo, please let us know if you feel the responses to your comments address the issues you raised or whether you have further comments.

karawoo commented 3 years ago

All of my comments have been addressed and I don't have any more to add. Nice work!

zambujo commented 3 years ago

All comments have been addressed from my side as well. No further comments other than: Well done!.. and apologies for the delayed reply.

annakrystalli commented 3 years ago

Thank you again @karawoo and @zambujo for your reviews and @mccallc for submitting and the subsequent work on the package! 🎉

annakrystalli commented 3 years ago

@ropensci-review-bot approve

ropensci-review-bot commented 3 years ago

Approved! Thanks @mccallc for submitting and @karawoo, @zambujo for your reviews! :grin:

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Last but not least, you can volunteer as a reviewer via filling a short form.