florianm commented 4 years ago

Submitting Author: Florian Mayer (@florianm)
Repository: https://github.com/dbca-wa/ruODK Version submitted: 0.6.6.9025 (Jun 2020) (originally 0.6.0) Editor: @maelle
Reviewer 1: @karissawhiting
Reviewer 2: @jmt2080ad
Archive: 2020-02-11 Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Type: Package
Package: ruODK
Title: An R Client for the ODK Central API
Version: 0.6.6.9025
Authors@R: 
    c(person(given = c("Florian", "W."),
             family = "Mayer",
             role = c("aut", "cre"),
             email = "Florian.Mayer@dbca.wa.gov.au",
             comment = c(ORCID = "0000-0003-4269-4242")),
      person(given = "Maëlle",
             family = "Salmon",
             role = "rev",
             email = "maelle.salmon@yahoo.se",
             comment = c(ORCID = "0000-0002-2815-0399")),
      person(given = "Karissa",
             family = "Whiting",
             role = "rev",
             comment = c(ORCID = "0000-0002-4683-1868")),
      person(given = "Jason",
             family = "Taylor",
             role = "rev"),
      person(given = "DBCA",
             role = c("cph", "fnd")),
      person(given = "NWSFTCP",
             role = "fnd"))
Description: Utilities to access and tidy up data from ODK
    Central's API.  ODK Central is OpenDataKit's clearinghouse for
    digitally captured data <https://docs.opendatakit.org/central-intro/>.
    ODK Central's API is documented at
    <https://odkcentral.docs.apiary.io/>.
License: GPL-3
URL: https://dbca-wa.github.io/ruODK/,
    https://github.com/dbca-wa/ruODK
BugReports: https://github.com/dbca-wa/ruODK/issues
Depends: 
    R (>= 3.4)
Imports: 
    clisymbols (>= 1.2.0),
    crayon (>= 1.3.4),
    dplyr (>= 0.8.5),
    fs (>= 1.4.1),
    glue (>= 1.4.0),
    httr (>= 1.4.1),
    janitor (>= 2.0.1),
    lifecycle (>= 0.1.0),
    lubridate (>= 1.7.8),
    magrittr (>= 1.5),
    purrr (>= 0.3.4),
    readr (>= 1.3.1),
    rlang (>= 0.4.5),
    rlist (>= 0.4.6.1),
    stringr (>= 1.4.0),
    tibble (>= 2.1.3),
    tidyr (>= 1.0.3),
    tidyselect (>= 1.0.0),
    xml2 (>= 1.2.2)
Suggests: 
    covr (>= 3.4.0),
    DT (>= 0.9),
    ggplot2 (>= 3.2.1),
    knitr (>= 1.26),
    leaflet (>= 2.0.3),
    listviewer (>= 3.0.0),
    rmarkdown (>= 1.17),
    roxygen2 (>= 7.1.0),
    testthat (>= 2.3.2),
    usethis (>= 1.6.0),
    vcr (>= 0.5.4)
VignetteBuilder: 
    knitr
RdMacros: 
    lifecycle
Encoding: UTF-8
Language: en_AU
LazyData: true
RoxygenNote: 7.1.0
X-schema.org-applicationCategory: Data Access
X-schema.org-keywords: database, open-data, opendatakit, odk, api, data, dataset

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [X] data retrieval
- [ ] data extraction
- [ ] database access
- [ ] data munging
- [ ] data deposition
- [ ] reproducibility
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Data retrieval: ruODK retrieves data from ODK Central, a data clearinghouse containing data which have been digitally captured by the data collection app ODK Collect using Xforms.

As mentioned by @annakrystalli, the below categories are touched by ruODK, but don't apply:

Data extraction or munging: ruODK transforms and sanitises the data from ODK Central from the original format (which parses to nested lists in R) into tibbles.

Reproducibility: ruODK allows to script and repeat the data extraction step - the main use case it is being written for.

Geospatial data: while ODK allows to capture location data (points, lines, polygons), and ruODK extracts these values, ruODK is not primarily a spatial package.

Who is the target audience and what are scientific applications of this package?

(Scope and use cases are also mentioned in the README.)

Any organisation collecting data with the OpenDataKit suite will need to extract the data from the data clearinghouses, ODK Aggregate (outgoing) or ODK Central (new). Some may want to analyse the data straight out of ODK Central, some may need to transfer the data into another data warehouse for further post-processing, QA, and integration with other data sources.

For both use cases, ruODK bridges and simplifies the gap between the data sitting in ODK Central, and the data being a tibble in R, ready for further processing.

In a nutshell, ruODK aims to be to ODK Central what ckanr is to CKAN.

As per release notes for ODK Central 0.6, next to the other options, ruODK is now the recommended package to access ODK Central data in R.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

https://github.com/ropensci/software-review/issues/328 Thanks to @annakrystalli for feedback on the pre-submission. Changes since then: added remaining functions.

Technical checks

Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

Note: I would like to submit a paper about the package in a few weeks, but haven't got the manuscript ready and approved for publication just yet.

[X] Do you intend for this package to go on CRAN? > if you see fit, yes please (0.6.6)
[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Options

- [ ] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)

[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Update 31 Aug: pasted new DESCRIPTION with version 0.6.1 (addressing all comments below). Update 14 Sept: 0.6.3 after tidyr 1.0.0 moves from dev to stable dependency Update 18 Sept: 0.6.4 uses new example data for tests and vignettes Update 23 Sept: 0.6.5 uses lifecycle badges Update 25 Sept: 0.6.6 odata_submission_get() parses data, dates, and downloads attachments Update 10-18 Oct: met ODK developers and users; met @jmt2080ad, met sckott (Scott Chamberlain), held four ruODK workshops (Perth, Seattle, Portland), got constructive feedback Update 27 Nov: 0.6.6.900* addressing reviewer comments, preparing for ODK Central 0.7 release Update 12 Feb 2020: reviewer comments addressed, some improvements added, ready for reviewer response Update 15 Jun 2020: reviewer response addressed, parsers added for geotrace and geoshape

maelle commented 4 years ago

Thank you @karissawhiting for your in-depth review! 😸

florianm commented 4 years ago

Thanks @karissawhiting for the review! I'll create issues from the review, and once all are closed, will summarise them here.

jmt2080ad commented 4 years ago

Package Review

[X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[X] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- Great package comparison in README.
[X] Installation instructions: for the development version of package and any non-standard dependencies in README
- devtools::install(build_vignettes = TRUE) does not install for me. I get permission errors that I do not get when I run the vingette chunks on their own.
[x] Vignette(s) demonstrating major functionality that runs successfully locally
- I cannot build the vignettes using devtools, but...
- I can run all the Rmd chunks used to build the vignettes. They are verbose and clear. Great documentation.
[X] Function Documentation: for all exported functions in R help
[X] Examples for all exported functions in R Help that run successfully locally
[X] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[X] Installation: Installation succeeds as documented.
[X] Functionality: Any functional claims of the software been confirmed.
[X] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Due to my credential error, I was not able to successfully run the tests.
- I am also getting the following error:

test-attachment_get.R:80: error: get_one_attachment handles repeat
download and NA filenames
unused argument (check = TRUE)
1: .handleSimpleError(function (e) 
   {
       handled <<- TRUE
       test_error <<- e
       options(expressions = expressions_opt_new)
       on.exit(options(expressions = expressions_opt), add = TRUE)
       e$expectation_calls <- frame_calls(11, 2)
       test_error <<- e
       register_expectation(e)
       e$handled <- TRUE
       test_error <<- e
   }, "unused argument (check = TRUE)", quote(tempdir(check = TRUE)))
   at /home/....../test-attachment_get.R:80

[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
- The namespace conflict with readr for the parse_datetime is discouraged by the guidelines.

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Installation

Building vignettes require a properly setup account. Even with a setup account, devtools::install(build_vignettes = TRUE) fails.
When attempting to build vignettes via devtools fails, the user password is printed to the console. ru_settings() also prints the password to the console. I would feel better if that was hidden from the user.

Notes on setup functions and vignette

Why provide a 'hard way' to pass credentials via a function call? My sense is that this will encourage bad practices in writing code, as this would give users a way of hard coding passwords in workflows, which is typically discouraged.

Notes on API functions and vignette

parse_datetime() appears to parse any column name with the text "time" in it by default or a user defined string. This could potentially try to parse unintended columns containing words like "candidate" or "centimeter". I think this functionality might be best left to the user to define for themselves.

tibble::tibble(
            centimeter = 1:10,
            millimeter = 1:10,
            candidate  = sample(c("Jack", "Jill"), 10, replace = T),
            rally      = sample(c("Home", "Away"), 10, replace = T)
        ) %>%
    ruODK::parse_datetime(.) %>%
    ruODK::parse_datetime(col_contains = "date")

Also, parse_datetime() is a readr function. During my review, those namespaces became confused in a couple of the examples and I was no longer able to run ruODK vignette code. Would a different name be better suited?

Notes on accessing odata functions and vignette

This error message should quote email address.

   Not Found (HTTP 404). Failed to get desired response from server
   https://sandbox.central.opendatakit.org/ as user
   "myemail@email.com".

In the odata vignette you have an expression fq_svc <- odata_service_get(). Everywhere else you define the namespace for an ruODK function, i.e. ruODK::odata_service_get(). For constancy I would edit this to include the namespace here and elsewhere where that is missing.
Also in the odata vignette, in the section "OData service document", there appears to be a network call to ODK Central to assign the fc_svc variable. The next comment however indicates that the canned data is used. It doesn't appear that the datasets associated with this package that are loaded early in this vignette are used.
In the odata vignette, in the section "Summarising data", the text is a bit odd as it implies the previous leaflet examples performed data transformations. Consider revising the language.
The rmarkdown template suggestion is great and the template would make a great quick guide.

maelle commented 4 years ago

Thanks a ton for your review @jmt2080ad! And how fun that you and @florianm met!

@florianm Now both reviews are in 😸

florianm commented 4 years ago

Thanks @jmt2080ad - as we discussed during the beer review process, all of that makes wild sense, some typos and wrinkles to iron out, nothing breaking. @karissawhiting - submission_get() will include the pmap step to get multiple submissions, I'll rename existing submission_get() into get_one_submission(). You are right, this makes better sense in the context of the other functions.

re vignettes not building - will rewrite vignettes to 1. use canned data but also 2. show but not evaluate the real functions, as discussed with Jason. Might need some help from Scott tomorrow on that. (Great to be in PDX)

re parse_datetime - will rename to avoid namespace conflict, and while the default is to parse anyt column containing "date" (common denominator of datetime and date, thought that was clever) actually is only used explicitly on date and dateTime columns as per form_schema, so the "candidate" fields will never be parsed automatically.

re explicit parameters for credentials - don't think this will promote bad code, it is meant to be used with Sys.getenv() or other variables. I will amend the vignette "setup" to show this.

First few fixes are done but will not be pushed until after the PDX and Seattle workshops. I hope to get around to the full list by the end of next week (work travel with spotty internet until Thu 24 Oct).

florianm commented 4 years ago

For the public record: Both reviewers have given me permission to attribute them as "rev" in the DESCRIPTION through the ROpenSci Slack chat.

maelle commented 4 years ago

Any ETA on your response to reviewers @florianm? The author guide states "We ask that you respond to reviewers’ comments within 2 weeks of the last-submitted review" (but I know you've been traveling 🙂).

florianm commented 4 years ago

(and moving house) thanks for the patience! Can I get back next week (ending Nov 15)?

maelle commented 4 years ago

Ok, good luck with the move and see you in this thread next week! 🙂

maelle commented 4 years ago

:wave: @florianm, any update? Thanks! :smile_cat:

florianm commented 4 years ago

Thanks for the patience, getting stuck into ruODK now. ODK are preparing to release v0.7 of ODK Central, I'll have to work my way through their API updates too.

maelle commented 4 years ago

👋 @florianm, any update? 🙂

florianm commented 4 years ago

Hi @maelle - apologies for the delay, hoping to finish this issue this week. Turtle season is in full swing!

Remaining work:

Dates and attachments come out as list columns. Something is going wrong in all the mutate_at/purrring in odata_submission_parse.
I'd like to make downloading attachment files (which can be several GBs) optional and rather hot-link the files. This would be handy for e.g. Shiny apps as previews. This could go in the "later" basket.
Add a few examples where missing, see last open issue.
Use vcr in tests if I can get it to work. Maybe that goes in the "later" basket.

stefaniebutland commented 4 years ago

Happy new year @florianm. Leaving a note here so you can let me know (after turtles, and ruODK completion 😄) when you're ready to dig into a blog post.

florianm commented 4 years ago

Update:

Fixed https://github.com/dbca-wa/ruODK/issues/40 for attachments
Hotlinking attachments is not yet possible (after discussion with ODK devs)
All issues I see as critical are fixed
ruODK has been extensively used in anger, looks robust enough now

I'll address the review comments here shortly.

florianm commented 4 years ago

@maelle @karissawhiting @jmt2080ad thanks for your patience and the excellent reviews!

I've addressed all comments, added some more features, tested ruODK in production over the last few months and believe we're ready for the next step.

Answers to your comments below! I've linked to the GH issues for brevity.

Reviewer comments Karissa Whiting

Build

I wasn't able to build without first specifying ODK credentials (because of api.Rmd and odata.Rmd vignettes). I’m not sure if there are established best practices with API packages and this issue, but @noamross pointed out this rcites package vignette as an example of one way this is addressed in API packages. Vignettes decide whether or not to eval code chunks based on an environment variable (in this case NOT_CRAN) and both ways are tested in the Travis-CI testing grid.

The vignettes now use canned data, but show the code to use live data.

https://github.com/dbca-wa/ruODK/issues/33 https://twitter.com/fistful_of_bass/status/1186422684504092673

README

You could add a bit more description about ODK Central Sandbox in this doc to help with multiple points of entry. New prospective users may be looking for the quickest way to test and play around with ODK/ruODK to see if it fits the needs of their project, and it seems this would be via a Sandbox account. To this point, you could adjust the "ODK Central Setup" section to say something like “Create a web user account on an ODK Central instance, or request an account on Sandbox..."

Rewrote https://dbca-wa.github.io/ruODK/index.html#access-to-an-odk-central-instance to lower the barrier for entry :-)

Detailed summary on documentation improvements at https://github.com/dbca-wa/ruODK/issues/34

Put “(ruODK)” in backticks for consistency (in CONTRIBUTING doc as well)

ruODK backticked.

Typo in link “Publish the formto ODK Central”.

Typo fixed. I'm also running a spellcheck during the release process.

Tests and CONTRIBUTING

There is a barrier to entry in contributing to the package and running tests because of the need for testing credentials. That being said, the CONTRIBUTING document and the new github form you set up to request credentials definitely does help with this. Is there a way to set up a public set of ODK credentials using a dummy account? If not, consider me a +1 vote for setting up cached responses in the future when the ODK API is more stable if it will provide a way for tests to work out of the box :)

Until all tests are converted to vcr (the CRAN 5MB package size limit might blow a fuse), the only way to contribute without compromising the ODK Central sandbox's security is to request an account, either through the account request issue template https://github.com/dbca-wa/ruODK/issues/new/choose or the ODK forum.

Also added friendly instructions to the README https://dbca-wa.github.io/ruODK/index.html#access-to-an-odk-central-instance.

Title of CONTRIBUTING is NA.

Fixed: https://dbca-wa.github.io/ruODK/CONTRIBUTING.html

You still have some leftover language from the usethis contributing doc. For example: “Before you do a pull request, you should always file an issue and make sure someone from the tidyverse team…”

Usethis boilerplate cleaned up.

"To run tests, you’ll need access to the ODK Central sandbox instance…" I think this should go right at the top or even in the “prerequisites” section of technical contributions.

Retructured contributing guide and added account request issue template.

Also https://github.com/dbca-wa/ruODK/issues/20

Namespace

Not all exported functions have examples, but some internal helper functions are exported that may not need to be (e.g. prepend_uuid(), yell_if_missing()).

Added examples for all exported functions. Helpers are still exported, as they might be useful for manual parsing. Helpers are grouped in docs as family "utilities".

Do you need utils-tidy-eval.R, and these rlang functions exported?

Some helpers use NSE, so I need those rlang functions. I used the usethis boilerplate to have the full set of rlang functions available to make future upgrades and maintenance easier.

Vignettes

I’m having some trouble distinguishing between the OData API and RESTful API functionalities. Do these workflows mirror each other? Does one workflow have functionality available that the other does not? I think some additional delineation between these could be helpful. Here are a few ideas on how to address this (suggestions only):

If these are parallel workflows, consider having the vignette structures mirror each other, and add the same blurb at the top of each (like “three ways to happiness") with links to the other vignette. If there are any large benefits or drawbacks of one method or the other, call them out there or in the README.

Does your preferred workflow consist of a mixture of both OData and RESTful calls? If so, consider creating just a typical workflow vignette that goes through a case study using both kinds of functions.

Vignettes are re-written and restructured and put into context of each other both at their introductory paragraphs and in the README.

While there is no preferred workflow, the most elegant workflow is the OData path, which is shown both in the README's screencast and part of the Rmd template.

https://github.com/dbca-wa/ruODK/issues/34

Consider renaming vignette file names to odata-api, and restful-api

Done, also vignette titles are adjusted. https://github.com/dbca-wa/ruODK/issues/34

code

I see several functions have . <- NULL. I’m assuming this is to avoid “no visible bindings for global variables”. One solutions is to add utils::globalVariables(".") in ruODK.R instead of specifying this in each function.

Done, and shamelessly stolen into my other packages. Thanks for the useful tip! https://github.com/dbca-wa/ruODK/issues/35

Do you need yell_if_missing() in functions that have the get functions as their default args? In those cases, it looks like if you are missing credentials the getdefault() functions already do the yelling, but I may be missing some edge case here.

Retained for the use case of directly specified, non-default credentials: https://github.com/dbca-wa/ruODK/issues/36#issuecomment-558040212

ru_setup() I was expecting a confirmation message of some sort. Maybe it could print ru_settings() when you run it by default? Is there a way to actually check the authentication of your username/project/form credentials here or in another function? ggmaps::register_google() has an optional argument to write = TRUE where it will write credentials to your .Renviron. Just wanted to point this out as a cool enhancement option.

Made settings great again: helpfully verbose, but not Wikileaks-level verbose. https://github.com/dbca-wa/ruODK/issues/37

@examples sections are formatted as R code so you can take out \code{ } tags (e.g. ru_setup())

Fixed and included in https://github.com/dbca-wa/ruODK/issues/34

odata_submission_get() It could be helpful to separate arguments for parsing the data and getting attachments in this function. It looks like odata_submission_get() returns all submissions, but the equivalent function for the RESTful workflow submission_get() returns one submission at a time. I expected these functions to mirror each other in terms of outputs. Perhaps it would make sense to turn this code chunk you have in the vignette into a function? This would mirror the attachment_get() / get_one_attachment() functions too.

Mostly addressed in https://github.com/dbca-wa/ruODK/issues/38 Parse but don't download: https://github.com/dbca-wa/ruODK/issues/49

Reviewer comments Jason Taylor

Discussed in person with Jason during rOpenSci beer review (apologies to Karissa I didn't make it to NY!)

Installation

Building vignettes require a properly setup account. Even with a setup account, devtools::install(build_vignettes = TRUE) fails.

See KW's comment: fixed by using canned data under the bonnet, but showing the code to use live data.

When attempting to build vignettes via devtools fails, the user password is printed to the console. ru_settings() also prints the password to the console. I would feel better if that was hidden from the user.

Addressed in https://github.com/dbca-wa/ruODK/issues/37

Notes on setup functions and vignette

Why provide a 'hard way' to pass credentials via a function call? My sense is that this will encourage bad practices in writing code, as this would give users a way of hard coding passwords in workflows, which is typically discouraged.

Re-wrote vignette setup to discuss the preferred method (.Renviron) and warn against bad practices.

Use cases of "the hard way" are the tests, where test functions call different projects and forms explicitly.

Notes on API functions and vignette

parse_datetime() appears to parse any column name with the text "time" in it by default or a user defined string. This could potentially try to parse unintended columns containing words like "candidate" or "centimeter". I think this functionality might be best left to the user to define for themselves.

We now use the form schema to detect date and datetime columns.

Also, parse_datetime() is a readr function. During my review, those namespaces became confused in a couple of the examples and I was no longer able to run ruODK vignette code. Would a different name be better suited?

Renamed to ru_datetime: https://github.com/dbca-wa/ruODK/issues/43

Notes on accessing odata functions and vignette

This error message should quote email address. Not Found (HTTP 404). Failed to get desired response from server https://sandbox.central.opendatakit.org/ as user "myemail@email.com".

Settings print now a more intelligible error message.

https://github.com/dbca-wa/ruODK/issues/37

https://github.com/dbca-wa/ruODK/issues/16

In the odata vignette you have an expression fq_svc <- odata_service_get(). Everywhere else you define the namespace for an ruODK function, i.e. ruODK::odata_service_get(). For constancy I would edit this to include the namespace here and elsewhere where that is missing.

Namespaced all the things!

Also in the odata vignette, in the section "OData service document", there appears to be a network call to ODK Central to assign the fc_svc variable. The next comment however indicates that the canned data is used. It doesn't appear that the datasets associated with this package that are loaded early in this vignette are used.

Vignettes should only use canned data, but show the code for live data https://github.com/dbca-wa/ruODK/issues/33

In the odata vignette, in the section "Summarising data", the text is a bit odd as it implies the previous leaflet examples performed data transformations. Consider revising the language.

I think I've revised the language there.

The rmarkdown template suggestion is great and the template would make a great quick guide.

Done and battle-tested in four workshops! Also included a binderhub link in the README.

maelle commented 4 years ago

Thanks @florianm!

@karissawhiting @jmt2080ad Do @florianm's answers satisfy you?

jmt2080ad commented 4 years ago

I was able to take a look at these responses and they seem to address my comments. I would like to take one last look at the code over this next week and give a more affirmative confirmation.

jmt2080ad commented 4 years ago

We now use the form schema to detect date and datetime columns.

ru_datetime still parses fields by sub-string. My concern here is that running ru_datetime will try to parse columns that have matching sub-strings but are not date/time. I understand this is a convenience function and works for particular cases. I'll leave it up to you to decide if the convenience outweighs the risk of inadvertently converting fields like in the example below.


df <- data.frame(
  dbh_centimeter = runif(10) * 100,
  timedate       = format(Sys.time(), "%H:%M:%S %Y/%m/%d"),
  datetime       = format(Sys.time(), "%Y/%m/%d %H:%M:%S"),
  posixtime      = as.POSIXct(Sys.time(), tz = "UTC")
)

df %>% ru_datetime(tz = Sys.timezone())

In your vignette, you use this function on data read in using readr::read_csv which automatically detects and sets the columns to "POSIXct POSIXt" where appropriate. These columns are also automatically assigned to the UTC timezone, which this function then converts to "Australia/Perth". Does this package always pull time data from the ODK servers in UTC? If that cannot be guaranteed, then this could be a problem. If the server is storing time data in a local timezone, read_csv is assigning that to UTC when reading in data to R, if you convert to local time using ru_datetime, wouldn't that transpose the time incorrectly?

Because this function calls isodt_to_local, it appears that this function has two purposes: to convert between UTC and local, and to set the date type where a specific sub-string (default "time") is present in the column name.

For bulk timezone conversions:

Keeping in mind my comment above, if you are going to do bulk timezone conversion, I think it would be safer to use type introspection instead of searching by some sub-string, e.g. where there is a "POSIXct POSIXt" type, convert the zone set in the tz argument of the ru_datetime call.

For finding date/time columns:

For bulk date/time type assignment, I think you need to make a handler for columns that are not converted correctly. In my example above, the dbh_centimeter and timedate columns are set to all NA due to conversion failure. A handler in isodt_to_local that checks for this then returns the original data would prevent errors trickling down a given processing pipeline that uses these functions.

I think these are my only comments. Everything else looks good.

florianm commented 4 years ago

Thanks for the detailed feedback @jmt2080ad!

ru_datetime field names

The "golden path" of accessing data from ODK Central uses the OData API. odata_submission_get() with default param parse = TRUE does the right thing here- it parses the form_schema for date and datetime fields. Here we won't need to touch ru_datetime() unless going down the manual "DIY parse" path.

Accessing data through the RESTful API on the other hand (both bulk export and individual access) involves more manual parsing rectangling, where indeed we call ru_datetime() directly.

Timezones have a very recent default (env var RU_TIMEZONE). See the tutorial in vignette setup and the reference docs for settings (argh, formatting bug with what should be bullet points under tz). The abundant "Australia/Perth" is my Perth-flavoured go-to example timezone. The default tz (if unspecified) is UTC (as is the default export format from ODK Central, as opposed to Sys.timezone()).
Bulk exports as zipped CSVs always export to UTC. The ODK Central API docs don't specify this explicitly, but doesn't specify anything to the contrary either.
ru_datetime() guarantees 1. type (POSIXct or POSIXt) and 2. timezone for (valid) date/datetime fields. I assumed that any mishap (parsing "cenTIMEter" as datetime) would be picked up sooner or later by the user.

@jmt2080ad how would you feel about a refactor of ru_datetime (see also https://github.com/dbca-wa/ruODK/issues/54 - sorry, @ bombed you):

apply ru_datetime() to all date/time columns of a submission table based on the form_schema
test for correct type after conversion, warn if not time/datetime
test for all NA after conversion, warn if all NA
Re-write vignette RESTful API > ru_datetime
Do the same for geopoints
Keep default tz at UTC instead of Sys.timezone() - think of RStudio Server hosted in other timezones (e.g. we're in Perth, our data centers are in Sydney) or analyses run by a multi-national team.

florianm commented 4 years ago

For ru_datetime I think we have a workable solution now - could I solicit your opinions? See https://github.com/dbca-wa/ruODK/issues/54 for details - after a complete refactor of submission parsing, ru_datetime is only called internally, and superceded by new helpers handle_ru_{datetimes, attachments, geopoints} as shown in vignettes. These are driven by the form schema definition, so they will never mistake cenTIMEmeter for a time field.

If you suggest to go even stricter, we could now drop the default field name component col_contains="date" from ru_datetime without complicating everyday usage.

@jmt2080ad let me know whether that addresses your suggestion suitably, and @karissawhiting I'm ready for your feedback.

jmt2080ad commented 4 years ago

Ok, this is looking really good. The handlers are using the form schema, which is great.

Concerning the handler functions and the functions they handle

I think your idea of dropping the default argument for col_contains makes sense. Also, do you see any reason for ru_datetime or isodt_to_local to be exported now that they are only used internally?

I see that there are a few places in the documentation where ru_datetime is mentioned. In the details for form_schema_parse:

form_schema_parse recursively unpacks the form and extracts the name and type of each field. This information then can be used to inform the user which columns require ru_datetime, attachment_get, or attachment_link, respectively

And I see that ru_datatime is used in the example for attachment_link.

If you decide not to export the functions that you are using only internally (totally your choice), then I think you should edit the documentation so that it does not point users to them. If you do export them, and use cases are not demonstrated in the vignettes, then they need to be well documented. Just take an extra moment to make sure that you are saying all you want to in their help.

These same points, exporting internal functions and mentioning internal functions in the documentation, should be considered for the other functions you wrote handlers for: split_geopoint and attachment_get.

I also noticed that attachment_link appears to be a place holder for functionality you intend to bring up in the future. It isn't demonstrated in the vignettes, and it has a TODO note in it that indicates that you will be improving this function to use the form schema to find attachment fields, like you did for ru_datatime. I am not sure the best approach here, but I think it's worth considering putting this in a development branch for now if it is not fully implemented. I might be wrong about your intentions here, so pardon me if that is the case. Would it make sense to do what this function does when a user calls handle_ru_attachments?

Other notes:

Here are some outdated function references I found in the vignettes:

Line 171 of odata-api.Rmd references the superseded odata_submission_parse.
Lines 193 and 197 of odata-api.Rmd references the superseded parse_datetime.
The section "Spatial formats - how granular..." that starts on line 319 of odata-api.Rmd has a few references to parse_submissions, which appears to have been superseded. I think this section needs to be updated to reflect the modifications to the workflow you have made.
Line 337 of restful-api.Rmd references the superseded odata_submission_parse.

noamross commented 4 years ago

⚠️⚠️⚠️⚠️⚠️

In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.

In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.

Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.

The rOpenSci Editorial Board

⚠️⚠️⚠️⚠️⚠️

annakrystalli commented 4 years ago

⚠️⚠️⚠️⚠️⚠️ In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci new submissions for software peer review are paused.

In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent. Other rOpenSci community activities continue.

Please check back here again after 25 May when we will be announcing plans to slowly start back up.

We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.

The rOpenSci Editorial Board ⚠️⚠️⚠️⚠️⚠️

maelle commented 4 years ago

We're back!

@florianm can you please provide an update here on work needed on your side / on what the reviewers should have a look at for being able to check "The author has responded to my review and made changes to my satisfaction. I recommend approving this package."? Thank you!

florianm commented 4 years ago

Great to hear! I've split the reviewers' suggestions into issues and have addressed them all. So at a top level, reviewers could revisit the ruODK issues linked here below their comments and should find there good summaries about my rationale.

A few new tests use a new form (spatial types) but use packaged data, so the additional env var is only needed to rebuild the packaged data.

Should I write a comment here to summarise how I've addressed all reviewer suggestions, or are the cross-linked issues sufficient?

maelle commented 4 years ago

Thanks! :turtle: :sparkles:

A summary here would be great once you're done with changes! :pray:

florianm commented 4 years ago

Hi all,

here's a summary of the second round (Jason's comments) of reviewer comments. I've split Jason's comments into the three issues linked under his comments, which I'll also summarise here.

Internal helpers

Details

TLDR Jason asked to re-think which functions to export, and which to keep purely internal. I have changed ruODK from "export all the functions" to export only the top level (odata_submission_get) and the next level of helpers (handle_ru_*). This allows users to choose the level of automation:

First level: parse all the things in one go - the default: odata_submission_get()
Second level: parse step by step - explained in vignettes under "DIY rectangling"
- Several functions which are useful for step by step data rectangling have been moved to the reference family "Utilities".
- handle_ru_{datetimes, attachments, geopoints, geotraces, geoshapes}:
- Naming streamlined: prefixed handle_ru_ and pluralised
- man
- Mentioned in vignettes odata-api > data rectangling and restful-api where required in the context of manual rectangling
- split_geo{point, trace, shape} have examples to show order of lon / lat / alt (coordinate order is notorious for swap errors vs. geojson)
- attachment_get is mentioned in vignette odata-api > data rectangling
The third level helpers used by the exported handle_ru_* handlers then are all internal, and not mentioned in any of man / vignette / readme:
- isodt_to_local
- strip_uuid
- prepend_uuid
- attachment_url
- unnest_all

Update all documentation

Details

TLDR Jason found some outdated references.

All documentation should now be aligned and tell the same story. Some new functionality brought another round of sweeping changes:

split_geo{point, trace, shape}: ruODK can now parse all spatial types, GeoJSON and WKT alike.
The tests use canned data of a new spatial test form
Rebuilding the canned data requires a new env var for the spatial test form. CONTRIBUTING.md has been updated to include the new test form.

Re-think `attachment_link`

Details

TLDR Jason asked whether attachment_link should be internal or exported.

Listed under "RESTful API" in man
Demonstrated in vignette restful-api - we're looking at the "export everything including attachments as one giant ZIP file" use case here. The difference to handle_ru_attachments is that the attachments are already downloaded and only need to be linked to the relative path of the downloaded and freshly extracted attachments.

I've just refreshed the GitHub Actions build matrix, adding more environments (11 in total). Plus TravisCI (Ubuntu 16.04 with R devel/release), plus Appveyor (Windows).

@maelle does the above address the reviewer comments suitably?

maelle commented 4 years ago

Thanks @florianm!

@karissawhiting @jmt2080ad are you both now happy with @florianm's response and if so could you please check the box "The author has responded to my review and made changes to my satisfaction. I recommend approving this package." in your review? If not please comment here. It's also fine to write you don't have time to have another look. Thanks!

florianm commented 4 years ago

Meanwhile, GHA has a heisen-segfault in ubuntu-latest / R devel and R release. I'm monitoring this over the next few days hoping it will resolve, so the red GHA badge is deliberate.

maelle commented 4 years ago

:wave: here! I see @jmt2080ad checked "The author has responded to my review and made changes to my satisfaction. I recommend approving this package."

@karissawhiting do you have further comments for @florianm? If not could you please update this thread or check the box in your review? Thank you!

maelle commented 4 years ago

Thanks @karissawhiting! I'll start the last checks now!

maelle commented 4 years ago

Approved! Thanks @florianm for submitting and @karissawhiting @jmt2080ad for your reviews! :turtle: :sparkles:

To-dos:

[x] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
[x] Fix all links to the GitHub repo to point to the repo under the ropensci organization.
[x] If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current pkgdown website with a redirecting page
- replace your package docs URL with https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
[x] Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
[x] We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

maelle commented 4 years ago

@jmt2080ad Could you please add a rough estimate of the time you spent reviewing in your review near Estimated hours spent reviewing:? Thank you.

florianm commented 4 years ago

@maelle I've completed the TODO list and followed the "life after onboarding" chapter. GH actions are passing, codemeta and all docs are updated, and I've minted a DOI. Did I do this roughly in the correct order?

All, would you mind getting mentioned in the Zenodo record? Edit: this is a question to Jason and Karissa, not editor Maelle ;-)

maelle commented 4 years ago

I think so. What will you use the DOI for in the near future btw? codemetar doesn't use those yet.

Don't list me as an author, as written in the dev guide, "Please do not list editors as contributors. Your participation in and contribution to rOpenSci is thanks enough!" :smile_cat:

florianm commented 4 years ago

I'll use the DOI to cite ruODK as a publication in our annual report. Having peer reviewed software is a first at DBCA, so a DOI helps here.

ropensci / software-review

Submission: ruODK, Client for the ODK Central API #335

Scope

Technical checks

Publication options

Code of conduct

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

Installation

Notes on setup functions and vignette

Notes on API functions and vignette

Notes on accessing odata functions and vignette

Reviewer comments Karissa Whiting

Build

README

Tests and CONTRIBUTING

Namespace

Vignettes

code

Reviewer comments Jason Taylor

Installation

Notes on setup functions and vignette

Notes on API functions and vignette

Notes on accessing odata functions and vignette

For bulk timezone conversions:

For finding date/time columns:

ru_datetime field names

Concerning the handler functions and the functions they handle

Other notes:

Internal helpers

Update all documentation

Re-think `attachment_link`

ropensci / software-review

Submission: ruODK, Client for the ODK Central API #335

Scope

Technical checks

Publication options

Code of conduct

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

Installation

Notes on setup functions and vignette

Notes on API functions and vignette

Notes on accessing odata functions and vignette

Reviewer comments Karissa Whiting

Build

README

Tests and CONTRIBUTING

Namespace

Vignettes

code

Reviewer comments Jason Taylor

Installation

Notes on setup functions and vignette

Notes on API functions and vignette

Notes on accessing odata functions and vignette

For bulk timezone conversions:

For finding date/time columns:

ru_datetime field names

Concerning the handler functions and the functions they handle

Other notes:

Internal helpers

Update all documentation

Re-think attachment_link

Re-think `attachment_link`