ropensci / ruODK

ruODK: An R Client for the ODK Central API
https://docs.ropensci.org/ruODK/
GNU General Public License v3.0
42 stars 13 forks source link

odata_submission_rectangle() drops columns with no data #100

Closed mtyszler closed 3 years ago

mtyszler commented 3 years ago

Problem

When using odata_submission_get() which uses odata_submission_rectangle() variables that are fully empty (i.e., only null) get dropped.

This is handy to get rid of notes, and groups, but also eliminates fully empty vairiables.

This might have been done in purpose, but might also be an issue with tidyr::unnest_wider and related to https://github.com/ropensci/ruODK/issues/46

Use case

In checking data entry and/or writing ETLs I want to be able to process expected data, instead of having to check if the variable is actually included or not. Also, in running checks, the fact a variable has never received a non-null value is important.

Reproducible example

library(ruODK)

ru_setup(
    un = "...",
    pw ="...", 
    url = "https://odkc.dbca.wa.gov.au",
    pid = 2,
    fid = "I8n_no_lang")

frm <-form_schema()
data <- odata_submission_get()

"barcode_test" %in% frm$name 
# expect TRUE

"barcode_test" %in% colnames(data)
# expect TRUE, returns false
Session Info ```{r} # utils::sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763) Matrix products: default locale: [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 [4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ruODK_0.9.2.9001 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 rstudioapi_0.11 janitor_2.0.1 magrittr_1.5 tidyselect_1.1.0 R6_2.5.0 rlang_0.4.8 [8] fansi_0.4.1 stringr_1.4.0 httr_1.4.2 dplyr_1.0.2 tools_3.6.3 utf8_1.1.4 cli_2.1.0 [15] ellipsis_0.3.1 assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 crayon_1.3.4 purrr_0.3.4 tidyr_1.1.2 [22] fs_1.5.0 clisymbols_1.2.0 vctrs_0.3.4 curl_4.3 glue_1.4.2 snakecase_0.11.0 stringi_1.5.3 [29] compiler_3.6.3 pillar_1.4.6 generics_0.1.0 jsonlite_1.7.1 lubridate_1.7.9 pkgconfig_2.0.3 ```
florianm commented 3 years ago

Indeed, tidyr::unnest_wider inside odata_submission_get() drops NULL columns. The output of submission_export() however includes the empty columns (e.g. barcode_test). Therefore it would make sense for odata_submission_get() to not drop empty columns.

My suggested best practice for now is to submit at least one complete submission to each form, which ensures that all expected columns are present. While this may not always be feasible, I found it very useful to include a "this is a test record" flag in each form to enable users to send me a test submission under realistic conditions.

I have no bandwidth to address this at the moment, but I'd be grateful for any pointers how to persuade tidyr::unnest_wider not to drop all NULL columns.

mtyszler commented 3 years ago

I have no bandwidth to address this at the moment, but I'd be grateful for any pointers how to persuade tidyr::unnest_wider not to drop all NULL columns.

Hi @florianm ! Totally understood. This is a difficult path, as is not fully dependent on ruODK

My suggested best practice for now is to submit at least one complete submission to each form

That's is a good suggestion. However, not always possible specially with large forms.

Suggestion

Maybe for now the only thing we can do is include in the docs that when using odata_submission_get() with parse = True or odata_submission_rectangle() NULL columns will get dropped.

Thanks for the quick answers.