ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

Submission: ruODK, Client for the ODK Central API #335

Closed florianm closed 3 years ago

florianm commented 4 years ago

Submitting Author: Florian Mayer (@florianm)
Repository: https://github.com/dbca-wa/ruODK Version submitted: 0.6.6.9025 (Jun 2020) (originally 0.6.0) Editor: @maelle
Reviewer 1: @karissawhiting
Reviewer 2: @jmt2080ad
Archive: 2020-02-11 Version accepted: TBD


Type: Package
Package: ruODK
Title: An R Client for the ODK Central API
Version: 0.6.6.9025
Authors@R: 
    c(person(given = c("Florian", "W."),
             family = "Mayer",
             role = c("aut", "cre"),
             email = "Florian.Mayer@dbca.wa.gov.au",
             comment = c(ORCID = "0000-0003-4269-4242")),
      person(given = "Maëlle",
             family = "Salmon",
             role = "rev",
             email = "maelle.salmon@yahoo.se",
             comment = c(ORCID = "0000-0002-2815-0399")),
      person(given = "Karissa",
             family = "Whiting",
             role = "rev",
             comment = c(ORCID = "0000-0002-4683-1868")),
      person(given = "Jason",
             family = "Taylor",
             role = "rev"),
      person(given = "DBCA",
             role = c("cph", "fnd")),
      person(given = "NWSFTCP",
             role = "fnd"))
Description: Utilities to access and tidy up data from ODK
    Central's API.  ODK Central is OpenDataKit's clearinghouse for
    digitally captured data <https://docs.opendatakit.org/central-intro/>.
    ODK Central's API is documented at
    <https://odkcentral.docs.apiary.io/>.
License: GPL-3
URL: https://dbca-wa.github.io/ruODK/,
    https://github.com/dbca-wa/ruODK
BugReports: https://github.com/dbca-wa/ruODK/issues
Depends: 
    R (>= 3.4)
Imports: 
    clisymbols (>= 1.2.0),
    crayon (>= 1.3.4),
    dplyr (>= 0.8.5),
    fs (>= 1.4.1),
    glue (>= 1.4.0),
    httr (>= 1.4.1),
    janitor (>= 2.0.1),
    lifecycle (>= 0.1.0),
    lubridate (>= 1.7.8),
    magrittr (>= 1.5),
    purrr (>= 0.3.4),
    readr (>= 1.3.1),
    rlang (>= 0.4.5),
    rlist (>= 0.4.6.1),
    stringr (>= 1.4.0),
    tibble (>= 2.1.3),
    tidyr (>= 1.0.3),
    tidyselect (>= 1.0.0),
    xml2 (>= 1.2.2)
Suggests: 
    covr (>= 3.4.0),
    DT (>= 0.9),
    ggplot2 (>= 3.2.1),
    knitr (>= 1.26),
    leaflet (>= 2.0.3),
    listviewer (>= 3.0.0),
    rmarkdown (>= 1.17),
    roxygen2 (>= 7.1.0),
    testthat (>= 2.3.2),
    usethis (>= 1.6.0),
    vcr (>= 0.5.4)
VignetteBuilder: 
    knitr
RdMacros: 
    lifecycle
Encoding: UTF-8
Language: en_AU
LazyData: true
RoxygenNote: 7.1.0
X-schema.org-applicationCategory: Data Access
X-schema.org-keywords: database, open-data, opendatakit, odk, api, data, dataset

Scope

Data retrieval: ruODK retrieves data from ODK Central, a data clearinghouse containing data which have been digitally captured by the data collection app ODK Collect using Xforms.

As mentioned by @annakrystalli, the below categories are touched by ruODK, but don't apply:

Data extraction or munging: ruODK transforms and sanitises the data from ODK Central from the original format (which parses to nested lists in R) into tibbles.

Reproducibility: ruODK allows to script and repeat the data extraction step - the main use case it is being written for.

Geospatial data: while ODK allows to capture location data (points, lines, polygons), and ruODK extracts these values, ruODK is not primarily a spatial package.

(Scope and use cases are also mentioned in the README.)

Any organisation collecting data with the OpenDataKit suite will need to extract the data from the data clearinghouses, ODK Aggregate (outgoing) or ODK Central (new). Some may want to analyse the data straight out of ODK Central, some may need to transfer the data into another data warehouse for further post-processing, QA, and integration with other data sources.

For both use cases, ruODK bridges and simplifies the gap between the data sitting in ODK Central, and the data being a tibble in R, ready for further processing.

In a nutshell, ruODK aims to be to ODK Central what ckanr is to CKAN.

As per release notes for ODK Central 0.6, next to the other options, ruODK is now the recommended package to access ODK Central data in R.

https://github.com/ropensci/software-review/issues/328 Thanks to @annakrystalli for feedback on the pre-submission. Changes since then: added remaining functions.

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

Note: I would like to submit a paper about the package in a few weeks, but haven't got the manuscript ready and approved for publication just yet.

JOSS Options - [ ] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

Update 31 Aug: pasted new DESCRIPTION with version 0.6.1 (addressing all comments below). Update 14 Sept: 0.6.3 after tidyr 1.0.0 moves from dev to stable dependency Update 18 Sept: 0.6.4 uses new example data for tests and vignettes Update 23 Sept: 0.6.5 uses lifecycle badges Update 25 Sept: 0.6.6 odata_submission_get() parses data, dates, and downloads attachments Update 10-18 Oct: met ODK developers and users; met @jmt2080ad, met sckott (Scott Chamberlain), held four ruODK workshops (Perth, Seattle, Portland), got constructive feedback Update 27 Nov: 0.6.6.900* addressing reviewer comments, preparing for ODK Central 0.7 release Update 12 Feb 2020: reviewer comments addressed, some improvements added, ready for reviewer response Update 15 Jun 2020: reviewer response addressed, parsers added for geotrace and geoshape

maelle commented 4 years ago

Editor checks:


Editor comments

:wave: @florianm! Thanks a lot for your submission! :grin: I have a few questions/asks I'd like to see tackled before I start looking for reviewers. I am also waiting to run the local checks because I don't have credentials yet, see my forum post.

The following items is rather a suggestion.


Reviewers: Due date:

florianm commented 4 years ago

Very good points, thanks! I'll handle them as issues at https://github.com/dbca-wa/ruODK/milestone/3 apologies in advance for @ bombing you, @maelle

maelle commented 4 years ago

No problem, I was glad to see the issues. However we strive to keep the discussion in this submission thread (similar to what happens in journals). This does not prevent opening issues (which I personally would also do for project management), but answers should also be here. So when all items are settled it'd be nice if you can write a comment here with quotes of the items and your answers. 🙂

I will answer about publication processes in the next few days.

Can you please explain to me how to finish my credentials setup to run the tests? Thanks.

florianm commented 4 years ago

No probs, when done I'll go through comments here and address how my issues address them. Give me a day or two to address the larger issues!

Test credentials as promised: https://github.com/dbca-wa/ruODK/blob/master/CONTRIBUTING.md#test https://dbca-wa.github.io/ruODK/CONTRIBUTING.html#test

maelle commented 4 years ago

Thanks! I've just realized you recommend using Rprofile but it is code run at each session opening, environment variables should be in .Renviron, I'm editing mine via usethis::edit_r_environ() right now.

maelle commented 4 years ago

cf https://whattheyforgot.org/r-startup.html (maybe useful to link from docs?)

maelle commented 4 years ago

I can run tests now but not code such as project_list() (e.g. knitting api.Rmd fails), how do I do that, more setup steps I guess? Can I use the same credentials again, changing their names?

florianm commented 4 years ago

Aw crap. The vignette might need settings for our internal ODK Central server. Will send settings in ca 3h, afk just now

florianm commented 4 years ago

You can always use explicit url, un, pw as per vignette Setup:

project_list(url=Sys.getenv("ODKC_TEST_URL"), un=..., pw=...)

maelle commented 4 years ago

Does the vignette use a real project or is it a sandbox project too? I'm a bit lost. :-)

florianm commented 4 years ago

The test settings (sandbox) are real forms with test data. The tests are read-only and non destructive, so no worries there.

The server used in the vignettes uses real forms with (so far) only a few test records. Both servers run ODK Central v0.6.

maelle commented 4 years ago

And you don't want random contributors to have access to the real forms from the vignette, correct? (I might be missing something obvious) A discuss.ropensci.org forum post about the general issue (using interesting examples in the vignette, without giving access to contributors, and with R CMD check still passing) might make sense?

florianm commented 4 years ago

Not a problem at all to scale actually! Currently I'm using the few forms/records I have so far. Once significant production data comes in, I might create a separate testing project and give both selected/invited web users (ruODK un/pw) and app users (to submit data via ODL Collect) access to just that. I think I'll get away with a manual invitation process for prospective contributors at least for the time being. How does that sound?

Also, invite to my server coming in an hour!

maelle commented 4 years ago

Cool so I'll then just have to add ODKC_URL / other variables to my .Renviron? Thank you!

florianm commented 4 years ago

Cool so I'll then just have to add ODKC_URL / other variables to my .Renviron? Thank you!

This should make way more sense now: https://dbca-wa.github.io/ruODK/CONTRIBUTING.html#test Also check your email for working settings!

maelle commented 4 years ago

Awesome, thank you! Got the email, will devtools::check() soon. I will be ODK now 😉

I think the instructions should not contain Sys.getenv in order to copy paste more easily to .Renviron. 🙂

florianm commented 4 years ago

Glad to see that urODK now :-D

I think the instructions should not contain Sys.getenv in order to copy paste more easily to .Renviron.

Happy to adjust, which instructions do you mean? Sorry, brain frazzled now.

Re authentication and settings, should I add abstraction levels like e.g. ckanr does?

maelle commented 4 years ago

I mean in https://dbca-wa.github.io/ruODK/CONTRIBUTING.html#test

ODKC_TEST_URL="https://sandbox.central.opendatakit.org"
ODKC_TEST_PID=14
ODKC_TEST_FID="build_Flora-Quadrat-0-2_1558575936"
ODKC_TEST_UN="your@email.com"
ODKC_TEST_PW="..."
ODKC_URL="https://odkcentral.dbca.wa.gov.au"
ODKC_UN="your@email.com"
ODKC_PW="..."

that are the lines for .Renviron.

reg your 2d question better to ping ckanr's maintainer who's an editor too :-)

florianm commented 4 years ago

Ah, thanks! Fixed and pushed now.

I'll work on the remaining issues https://github.com/dbca-wa/ruODK/issues/16 and https://github.com/dbca-wa/ruODK/issues/19 tomorrow. ckanr_settings look oddly familiar :-D

maelle commented 4 years ago

Oops I had missed your involvement there, sorry! 😁

maelle commented 4 years ago

Nothing to report from automatic tests, apart from this goodpractice's compliment

♥ Hurrah! Mathematical package! Keep up the delightful work!

so as soon as you have tackled the remaining issues and written answers here, and as I answer regarding MEE&rOpenSci publication process, we can move forward. :-)

florianm commented 4 years ago

I think I've addressed all issues now, will write up summary here as soon as I get a moment.

Last question, is there a best practice on importing packages? in ruODK I managed to keep goodpractice() and check() happy by:

https://kbroman.org/pkg_primer/pages/depends.html reads like I can just drop all @importFrom - should I do that?

maelle commented 4 years ago

You could drop the @importFrom https://r-pkgs.org/namespace.html#imports but you don't have to.🙂

florianm commented 4 years ago

Update: issues tackled, write-up coming tomorrow.

maelle commented 4 years ago

Awesome, looking forward to it!

florianm commented 4 years ago

Hi @maelle, thanks for the editor comments and further questions. I've included all talking points in the thread, so this answer should be comprehensively addressing all issues, and as mentioned, previous messages can be hidden for clarity.

Editor comments

I see a few open issues in your repo. Could you clarify here and in their description whether you're seeking advice on the functionalities implementation/usefulness? Why not solve some of them before the reviews of the package? It'd be better for the functionality to be implemented so the reviewers can review them too. It is fine if we put this issue on hold for a bit in order for you to have time to work on the enhancements if needed, and you could ask for help in rOpenSci channels.

Reg "Note: I would like to submit a paper about the package in a few weeks, but haven't got the manuscript ready and approved for publication just yet." This is also a good reason to put the issue on hold since the reviewers would read the short JOSS paper. What do you mean by approved?

I do not understand "Especially in these trying times, it is important to ask: “ruODK?”" and " u r ODK!" in the README, could you add footnotes or so there? Unless it is very obvious see_no_evil (play on words with OK? to my defence and the defence of my suggestion, my not being a native speaker might have made this less clear joy )

Likewise I do not understand the use of %<% when describing use cases.

There is a link missing " [ODK forum]". How should the future reviewers best ask for credentials if they're new to the service? I wasn't too sure about my own post.
The current error messages are not informative in the absence of credentials, i.e. I get Error in curl::curl_fetch_memory(url, handle = handle) : <url> malformed if I run project_list() now without credentials. If there's no system variables set and no credentials passed either, the functions should fail with an informative error message.
Do not use the Author and Maintainer fields in DESCRIPTION, they are automatically generated from Authors@R.
In the example in the README you source .Rprofile. This should not be done, in your code in at least one function I see you use Sys.getenv("NAMEOFTHEVARIABLE") to access the credentials which is the best practice.
The reference section of the pkgdwon website could use some grouping. (ignore the CI advice, outdated, cf https://ropensci.org/technotes/2019/06/07/ropensci-docs/). I see you used "usethis goodies", 1) CONTRIBUTING.md still mentions the tidyverse team at least once, 2) For info here is our guidance for accepted packages.

There's info about contributing both in CONTRIBUTING.md and in the README (release steps), could you consolidate those? Can you please explain to me how to finish my credentials setup to run the tests? Thanks.

The following items is rather a suggestion.

Is there any diagram of ODK services/a workflow with ODK that you could use/adapt to explain visually where your R package fits, as a complement of the written part of the README?

Editor's further requests in thread

You could drop the @importFrom https://r-pkgs.org/namespace.html#imports but you don't have to.

And you don't want random contributors to have access to the real forms from the vignette, correct? (I might be missing something obvious) A discuss.ropensci.org forum post about the general issue (using interesting examples in the vignette, without giving access to contributors, and with R CMD check still passing) might make sense?

Other improvements

Output from goodpractice:

(no issues or suggestions)

I think that's it! New things I've learned:

maelle commented 4 years ago

Thanks a lot @florianm! I'll look for reviewers now. Note that once reviewers are recruited, we give them 3 weeks to complete their review, then there's time for your changes, then for their final response (and a continuing discussion) so I think your package will not be approved before the mentioned meetings, however you already get to display an rOpenSci "in review" badge, orange for now:

[![](https://badges.ropensci.org/<issue_id>_status.svg)](https://github.com/ropensci/software-review/issues/<issue_id>)

Edited to add: the badge is still grey but it'll become orange soon, the badges server will detect the status change.

I'll hide part of our discussion to make the issue thread easier to catch up with for reviewers.

maelle commented 4 years ago

@florianm A further request actually, sorry, I had not googled "Open Data Kit R" before now because of the existence of the pre-submission inquiry. Doing so I found odkr, could you please compare the functionalities of your and that package? Feel free to also include the other packages that are somehow related to ODK. An ideal comparison would include prose and a table. Example of comparison tables for rtweet and CoordinateCleaner. I'd like to know how ruODK is "best in class", and how the other packages either overlap or complement its functionality. Thank you!

annakrystalli commented 4 years ago

Sorry about this mixup both. This was totally my omission. Should have done a better job at the presubmission stage.

maelle commented 4 years ago

Nope, my omission too!

florianm commented 4 years ago

@florianm A further request actually, sorry, I had not googled "Open Data Kit R" before now because of the existence of the pre-submission inquiry. Doing so I found odkr, could you please compare the functionalities of your and that package? Feel free to also include the other packages that are somehow related to ODK. An ideal comparison would include prose and a table. Example of comparison tables for rtweet and CoordinateCleaner. I'd like to know how ruODK is "best in class", and how the other packages either overlap or complement its functionality. Thank you!

Fixed :-) (update: some more wrangling of md tables) No trouble at all, actually a great time to search and review again - two more packages (cbs) joined the fun since I last googled. Comparison added to bottom of README. TLDR: ruODK is the only R package targeted specifically at ODK Central, without any other heavy-weight dependencies, the only one (need to re-check) that works with ODK Central's OData dialect, and the only one with worked end-to-end examples. While I give the other OData packages another whirl, which other criteria could I add to the review?

maelle commented 4 years ago

Wow awesome work, thank you! I can't think of other criteria right now. I'll now look for reviewers!

sckott commented 4 years ago

Once ODK Central solidifies around v1.0, we can probably turn to using webmockr & vcr for caching test results.

i've written a section in the http testing book trying to help folks think through when to use webmockr vs. vcr vs. fakes vs. real requests https://ropenscilabs.github.io/http-testing-book/testing-considerations.html

florianm commented 4 years ago

@sckott cheers, I'll read those again to figure out whether/how I can prevent testing against possibly outdated API responses in case ODK Central changes.

florianm commented 4 years ago

In addition to intermittent server side issues, your tests may be performing queries with cached (vcr) or mocked (webmockr) responses that are no longer valid with the current state of the remote service. It may be harmless, for example, the response to some query now returns no data because the data for that entity was removed. But it could be more serious in that the remove service changed their API such that an API route is no longer available or the route name has changed, or similar.

The ODK Central API will change, and could bring breaking changes, until ca release 1.0. Therefore I want tests to run against the real deal (the test server is ODK's own public sandbox, which is always latest master), so that ruODK has to keep up with ODK Central.

Testing real HTTP interactions should be the slowest option, but has the benefit of not adding any (permanent) files to your package. Mocking tests can be very lite weight, though you can include very heavy responses.

The one thing making my tests slow is repeated download of a ca 50MB zip file, both for testing the insides (which requires some form complexity as discussed above), which could change, and testing handling repeated dl/skip logic. The above reads as that response cached in vcr would be added to the installed package which will probably make CMD check very unhappy. The best approach for ruODK might be to add a second test project with a smaller export size (maybe a few records and only one (small) file attachment) for those scenarios purely to speed up tests. ETA for that is early next week. Hope this does not hold up the submission process.

sckott commented 4 years ago

Therefore I want tests to run against the real deal

makes sense

which will probably make CMD check very unhappy.

I don't think the large size of the cassettes (cached http responses) affects r check, e.g., the cassettes directory in this package https://github.com/ropensci/rgbif/ is 32 MB, but running check on it throws no notes/warnings/etc - But i could be wrong -

if size of cassettes is a concern, one option is to skip tests on CRAN for which cassettes are too large to include in the pkg, and those cassette files added to .Rbuildignore

maelle commented 4 years ago

First reviewer recruited, thanks a lot @karissawhiting! :smiley_cat: Reviewer guide.

@florianm no big changes planned apart from this improvement of tests, correct?

florianm commented 4 years ago

@maelle nothing breaking! Two minor ideas:

maelle commented 4 years ago

Ok, but master will be stabke/unchanged for the reviews? 🙂

florianm commented 4 years ago

Absolutely! The Rmd templates would be nice starting points e.g. for a workshop. Just pushed an update, have details to my main reply above. Update 16 Sep: added one Rmd template, see last paragraph vignette "odata". No changes to code.

@karissawhiting thanks for reviewing ruODK! If you want to run tests and build vignettes, you'll need an account at the public ODK Central sandbox. If you DM me your preferred email for that purpose on the ROpenSci Slack, I'd be happy to create an account for you on the ODK Central sandbox.

florianm commented 4 years ago

@maelle @karissawhiting also soliciting feedback from the ODK community in the ODK Forum here.

florianm commented 4 years ago

Quick note on failing check: My vignette "api" fails to build on 2 out of 3 envs. Methinks the file attachments on ODK Central's sandbox might have suffered damage. This is a good moment to switch over to a similar but newer example form, and populate that form with submissions with smaller file attachments (set the data collection device camera to take 640px photos).

TLDR https://github.com/dbca-wa/ruODK/issues/23 will likely fix tests, ETA later today if things go to plan.

Update 18 Sep: https://github.com/dbca-wa/ruODK/issues/23 did indeed fix tests. v0.6.4 has all new example data, which fixes tests (spurious errors from older example data), reduces package size (removes CMD check note on package size). @maelle you'll need a new ODKC_TEST_FID="build_Flora-Quadrat-0-4_1564384341". @karissawhiting when you're ready to run tests and build vignettes, get in touch to receive test env vars and an account on the ODK Central sandbox.

maelle commented 4 years ago

Second reviewer recruited, thanks @jmt2080ad! :tada: Reviewer guide

@florianm can you point both reviewers to instructions for testing to keep things smooth? Thanks!

maelle commented 4 years ago

Reviews are due on 2019-10-10 :smile_cat:

florianm commented 4 years ago

@maelle thanks heaps! @karissawhiting @jmt2080ad I've DM'd you both on the ROpenSci Slack re test credentials.

florianm commented 4 years ago

@karissawhiting @jmt2080ad @maelle Update, have added lifecycle badges to all functions. The resulting NOTE is a bug upstream of the lifecycle package, see https://github.com/r-lib/lifecycle/issues/22. I've added an explanation to my cran-comments.md.

Jason, feel free to contact me for test credentials on the ROpenSci slack when you're ready to commence the review. Your involvement is very appreciated!

florianm commented 4 years ago

Update, just pushed a minor clean-up following advice from Jenny B on the ROpenSci slack. Removed some now obsolete helpers (map_chr_hack and friends), rebuilt docs.

florianm commented 4 years ago

@karissawhiting @jmt2080ad @maelle apologies for the impending code churn - I found a way to make ruODK way easier to drive: odata_submission_parse() now automatically parses the raw submission data into tibbles, parses dates, and downloads attachments. This will make the "OData" vignette a bit more concise and address https://github.com/dbca-wa/ruODK/issues/6

update: version 0.6.6 simplifies the main function odata_submission_get()

maelle commented 4 years ago

@jmt2080ad @karissawhiting 👋 Friendly reminder that your reviews are due tomorrow on 2019-10-10. 🙂

karissawhiting commented 4 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 4


Review Comments

ruODK provides a much-needed link between the ODK data collection/management software suite and R. The package is modular and well-structured, and the code is tidy, styled and readable. Documentation and vignettes are informative and make the project accessible through concise descriptions and links to resources as needed.

For context, this review is coming from the perspective of someone who has experience with the ODK software suite, but little experience with R API wrapper packages.

Build

README

Tests and CONTRIBUTING

NAMESPACE

Vignettes

Notes on Code

submissions <- tibble::tibble(
  pid = ruODK::get_test_pid(),
  fid = ruODK::get_test_fid(),
  iid = sl$instance_id,  # this is a vector of multiple instance_ids
  url = ruODK::get_test_url(),
  un = ruODK::get_test_un(),
  pw = ruODK::get_test_pw()
) %>% 
  purrr::pmap(ruODK::submission_get)

Thank you for sharing this great package!!