Closed TimonWeitkamp closed 3 years ago
Hi Timon! Sorry, just returning from leave. Would you be able to upload your form to the ODK Central Sandbox and capture a record with it? Hoping we can reproduce the bug this way without compromising your own server/form.
Can you run odata_submission_get(parse=F)
to see whether the coordinates are downloaded completely?
I would have hoped that my unit tests prove that gepshape parsing works and won't lose data.
Hi, no worries :)
odata_submission_get(parse=F)
does download some complete polygons, and some not (just like the example above). For example, ruODK result:
$value[[245]]$Polygon [1] "POLYGON ((33.123963782563806 -18.933762134984136 739))"
QGIS result of same polygon:
The strange thing is that for some records the complete polygons are downloaded with ruODK
...
Hm, I'd need to access a failing record to diagnose further. This is a serious issue I'd like to fix, really appreciate the report and your help.
Can you provide a reproducible failing example I can interact with? Ideally your form on the ODK Central Sandbox with a record, short of asking for access to your current ODK Central instance which I feel would be intrusive.
Timon has shared the form with me, I've uploaded the form to the ruODK package test project on the ODK Sandbox. A first submission (added polygon vertices by manual tapping) seems to work fine: https://rpubs.com/florian_mayer/ruodk_issue95
@TimonWeitkamp could you accept the invite to the ODK Central Sandbox, set up ODK Collect with the QR code for app user "timon" you can get here and collect a few records? You should be able to run the code from https://rpubs.com/florian_mayer/ruodk_issue95 with your own ODK Central credentials (un, pw).
Feel free to email me privately an RData file with the downloaded unparsed submission(s).
I tested the survey in the sandbox as well (both walking with the phone and tapping manual points), and all of the polygons seem complete.
I will email you the RData file.
Thanks for the data files from your private project, and for submitting test data to your form on the ODK Central Sandbox.
The test data on the Sandbox seem fine: https://rpubs.com/florian_mayer/ruodk_issue95
In your privately shared data, I can see the issue:
# data unparsed from odata_submission_get(wkt=TRUE):
du <- readRDS("unparsed_data.rds")
# First record shows incomplete field "Polygon" (type geoshape)
du$value[[1]]$Polygon[1]
"POLYGON ((33.66333849728107 -25.04568258649638 0))"
du$value[[1]]$`__id`
[1] "uuid:141823e2-fc16-47b0-8b31-903e557f4a85"
# Same record in CSV/ZIP export shows five vertices in field Polygon: (linebreaks for readability)
-25.04568258649638 33.66333849728107 0.0 0.0;
-25.045986943913125 33.66352893412113 0.0 0.0;
-25.045775837801138 33.66398457437754 0.0 0.0;
-25.045500336184592 33.66380486637354 0.0 0.0;
-25.04568258649638 33.66333849728107 0.0 0.0
So the unparsed data is incomplete after ruODK::odata_submission_get(parse = FALSE)
. On ruODK's side, the data is a nested list, coming from the httr::content()
of the response to the API call "v1/projects/{pid}/forms/{URLencode(fid, reserved = TRUE)}.svc/{table}"
. ruODK does not modify the data in any way.
That also means ruODK::odata_submission_rectangle()
/split_geoshape()
are unlikely to be the cause. I mention this as the geoshape/trace/point parsing involves a regex to discard trailing commas (ODK Central versions < 0.8).
Next questions to @TimonWeitkamp:
What version of ODK Central are you using? I assume it's > 0.8?
Can you reproduce the missing coordinates through odata_submission_get(wkt=FALSE)
which will return GeoJSON?
I'm using version 1.0 indeed.
If I download the GeoJSON, I miss coordinates as well:
Timon has provided me with access to his instance, which shows those errors in the form.
Here's an excerpt showing that OData loses some geoshape data, whereas the CSV export and the RESTful submission_get
do not.
For clarity, I show the code retrieving the same data in three separate ways.
data
: OData, WKT, parsed.data_raw_wkt
: OData, WKT, unparsed. ruODK's odata_submission_get(parse=TRUE)
would odata_submission_rectangle
and handle_geoshape
(and handle the other data types - attachments, datetimes, geopoints, geotraces) to produce the parsed data
. The unparsed version is interesting as its content is untouched by any data transformation of ruODK.data_raw_gj
: OData, GeoJSON, unparsed. Again, can be rectangled and parsed into data
.data_csv
: Export to ZIP > unpack > load CSV. R does some parsing and guessing of data types here, but preserves the geoshape coordinates as text.sub_raw
: Unparsed (nested lists) output of submission_get
. Option 1: via OData
data <- ruODK::odata_submission_get(
download = FALSE, # we don't need attachments here
table = ft$url[1],
local_dir = loc,
wkt = TRUE
)
data_raw_wkt <- ruODK::odata_submission_get(
download = FALSE, # we don't need attachments here
table = ft$url[1],
local_dir = loc,
wkt = TRUE,
parse = FALSE
)
data_raw_gj <- ruODK::odata_submission_get(
download = FALSE, # we don't need attachments here
table = ft$url[1],
local_dir = loc,
wkt = FALSE,
parse = FALSE
)
# Option 2: via ZIP export, set overwrite = TRUE to refresh download
data_csv_zip <- ruODK::submission_export(overwrite = FALSE)
data_csv_extracted <- unzip(data_csv_zip)
data_csv <- readr::read_csv(data_csv_extracted[[1]])
# Option 3: via REST
sl <- ruODK::submission_list()
sub_raw <- ruODK::submission_get(sl$instance_id)
The first submission already demonstrates the bug. We'll retrieve the instanceID to prove that the five different R objects contain the same record in their first row / list element.
R> data$id[[1]]
[1] "uuid:55673ddd-bc33-4919-9f42-f61370643e4b"
R> data_raw_wkt$value[[1]]$`__id`
[1] "uuid:55673ddd-bc33-4919-9f42-f61370643e4b"
R> data_raw_gj$value[[1]]$`__id`
[1] "uuid:55673ddd-bc33-4919-9f42-f61370643e4b"
R> data_csv$KEY[[1]]
[1] "uuid:55673ddd-bc33-4919-9f42-f61370643e4b"
R> sub_raw[[1]]$meta$instanceID[[1]]
[1] "uuid:55673ddd-bc33-4919-9f42-f61370643e4b"
Now we'll look at the offending geoshape field named "Polygon".
# OData - missing data in both parsed and unparsed versions, both WKT and GeoJSON formats
R> data$polygon[[1]]
[1] "POLYGON ((33.72580546885729 -24.986248868859494 0))"
R> data_raw_wkt$value[[1]]$Polygon
[1] "POLYGON ((33.72580546885729 -24.986248868859494 0))"
R> data_raw_gj$value[[1]]$Polygon
$type
[1] "Polygon"
$coordinates
$coordinates[[1]]
$coordinates[[1]][[1]]
[1] 33.72581
$coordinates[[1]][[2]]
[1] -24.98625
$coordinates[[1]][[3]]
[1] 0
# CSV export - OK
R> data_csv$Polygon[[1]]
[1] "-24.986248868859494 33.72580546885729 0.0 0.0; -24.98602033783035 33.72611157596111 0.0 0.0; -24.985721301905958 33.72586816549301 0.0 0.0; -24.985910022833146 33.72559256851673 0.0 0.0; -24.986248868859494 33.72580546885729 0.0 0.0"
# RESTful submission_get: OK
R> sub_raw[[1]]$Polygon
[[1]]
[1] "-24.986248868859494 33.72580546885729 0.0 0.0; -24.98602033783035 33.72611157596111 0.0 0.0; -24.985721301905958 33.72586816549301 0.0 0.0; -24.985910022833146 33.72559256851673 0.0 0.0; -24.986248868859494 33.72580546885729 0.0 0.0"
The above output shows that the OData submission API returns only the first / last coordinate of the geoshape, while the other endpoints (CSV/ZIP export, RESTful submission_get) return the full record.
The fact that the coordinates are already missing in the unparsed (raw) OData response shows that ruODK::odata_submission_rectangle
and ruODK::handle_geoshape
do not lose data themselves. One point goes in, one point comes out.
This seems to point towards the OData submission API endpoint, unless I have overlooked something. In contrast, the same form deployed to the ODK Central Sandbox with a handful of data collected by both Timon and me does not show that problem (yet).
The discussion on the ODK Slack chat indicates that the problem are whitespaces in the captured geoshapes.
Valid geoshapes contain ";" separated coordinate tuples.
"-24.986248868859494 33.72580546885729 0.0 0.0;-24.98602033783035 33.72611157596111 0.0 0.0;-24.985721301905958 33.72586816549301 0.0 0.0;-24.985910022833146 33.72559256851673 0.0 0.0;-24.986248868859494 33.72580546885729 0.0 0.0"
Invalid geoshapes have additional whitespaces after the ";":
"-24.986248868859494 33.72580546885729 0.0 0.0; -24.98602033783035 33.72611157596111 0.0 0.0; -24.985721301905958 33.72586816549301 0.0 0.0; -24.985910022833146 33.72559256851673 0.0 0.0; -24.986248868859494 33.72580546885729 0.0 0.0"
ODK Central does not post-process geoshapes on CSV/ZIP export, but does post-process them on OData export. ODK Central's OData geoshape/trace parser is likely to cut off coordinates after the whitespace, explaining the "only one coordinate" geoshapes.
@TimonWeitkamp as discussed via email, the CSV/ZIP export will provide you with the data in full fidelity, but repeat media file download. Could you provide details about the data collectors' devices and capture methods in the ODK Slack chat? Are you happy for me to close this issue seeing it's a bug between ODK Collect and Central?
I will provide the information in the Slack chat.
Thanks for the help, you can close this issue.
Problem
Following the example of ruODK spatial, I manage to download the data from the ODK Central server, with odata_submission_get(wkt=TRUE), and I can manage to make it an sf through st_as_sf(wkt="polygon column") with no errors.
I want to view the polygons through leaflet() or mapview(), but I get the following errors.
If I use the ruODK data (data("geo_wkt", package = "ruODK")), all works as expected, just like in the example.
So I then took a closer look at the polygon column, and I can see the values of only the first xyz coordinate of the polygon. For the data_wkt:
For the sf:
If I download the CSV file manually from the server, and upload the file to QGIS through ODKTrace2wkt, there are no problems, I can see the polygons; so it is not an error on the data collection side. Somewhere along the download, geodata is left behind.
Reproducible example
I don't have a reproducible example, other than the two data points I also mentioned above
For the data_wkt:
For the sf:
Session Info
```{r} > utils::sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041) Matrix products: default locale: [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Netherlands.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sf_0.9-5 mapview_2.9.0 ggplot2_3.3.2 leaflet_2.0.3 dplyr_1.0.2 ruODK_0.9.1.9002 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 pillar_1.4.6 compiler_4.0.2 base64enc_0.1-3 class_7.3-17 remotes_2.2.0 tools_4.0.2 [8] digest_0.6.25 gtable_0.3.0 satellite_1.0.2 lifecycle_0.2.0 tibble_3.0.3 lattice_0.20-41 pkgconfig_2.0.3 [15] png_0.1-7 rlang_0.4.7 DBI_1.1.0 rstudioapi_0.11 crosstalk_1.1.0.1 e1071_1.7-3 withr_2.2.0 [22] stringr_1.4.0 httr_1.4.2 raster_3.3-13 generics_0.0.2 vctrs_0.3.4 htmlwidgets_1.5.1 webshot_0.5.2 [29] stats4_4.0.2 classInt_0.4-3 grid_4.0.2 tidyselect_1.1.0 glue_1.4.2 R6_2.4.1 sp_1.4-2 [36] purrr_0.3.4 magrittr_1.5 scales_1.1.1 codetools_0.2-16 ellipsis_0.3.1 htmltools_0.5.0 units_0.6-7 [43] colorspace_1.4-1 KernSmooth_2.23-17 stringi_1.4.6 munsell_0.5.0 leafem_0.1.3 crayon_1.3.4 ```