Closed dpagendam closed 3 years ago
Hi @dpagendam, thanks for the detailed error report, and great to see ruODK used here in Oz :-) Apologies for the wording in the issue template - it should mention not so share secrets. Tracked at #115
By and large, ruODK seems to work without big surprises. Possibly related to this issue:
{"message":"Could not find the resource you were looking for.","code":404.1}
), one is 0kB.
I know that the download crashes on the HTTP 404 behind these 76kB files. With odata_submission_get
's default to retain already downloaded files (without any costly sanity check), eventually all "broken" files are downloaded, and odata_submission_get
resumes without crashes.httr::GET(retry=3)
.
Your retries=10
indicate that's not the cause of your issue here.I've got one form with 40k+ submissions where some photos in a particular nested repeat sometimes don't download. These are the steps I'll try (feel free to try the same on your end and report your findings back here):
odata_submission_get(download=FALSE)
and with photos using odata_submission_get(download=TRUE)
. Repeat a few times, see whether you get one of those 76B photos per failed run.I've pushed a minor patch to let attachment_get
skip downloading attachments with blank filenames.
I'll have to test whether attachment downloads in nested tables work as expected. I can reproduce a situation where attachments in a repeated form group do not download at all. In contrast, the vignette on OData manages to download attachments to repeated form groups (nested tables) completely fine.
Edit: There are gremlins in ruODK's handling of attachments, working on a fix. At this point, this issue looks more likely to be a bug in ruODK rather than data loss in ODK Central.
@dpagendam I've pushed a bugfix to make sure ruODK downloads all attachments from both the main "Submissions" table and any nested subtables ("Submissions.GROUP_NAME"). I'm verifying the bugfix with a full data ETL run later today.
Could you re-install ruODK from latest main branch again (v 0.9.7) and see whether you can re-create the download timeout issue? A second guess would be to increase swap and RAM for your server, maybe there's some congestion happening on the disk.
Reopening this issue. I'm getting spurious timeouts on one of my production forms on an ODK Central v0.6 instance. I'll modify downloading the attachments to tolerate timeouts and emit a warning message. If the timeouts come from the server rather the attachment, re-running the download could resolve such timeouts.
I can see that some of my attachments do not exist on ODK Central:
curl --include https://odkcentral.dbca.wa.gov.au/v1/projects/1/forms/build_Site-Visit-Start-0-3_1559789550/submissions/uuid:fcf3d82a-3276-44d9-9b36-5f43ac460692/attachments/1597722025465.jpg -u EMAIL -p --output file.jpg
Enter host password for user 'EMAIL':
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
100 76 100 76 0 0 96 0 --:--:-- --:--:-- --:--:-- 96
~> cat file.jpg
HTTP/2 404
server: nginx
date: Mon, 08 Mar 2021 09:56:08 GMT
content-type: application/json; charset=utf-8
content-length: 76
x-powered-by: Express
etag: W/"4c-ShTrjQvXJDA49Wfxw7ES7C6cQFg"
strict-transport-security: max-age=63072000
{"message":"Could not find the resource you were looking for.","code":404.1}
# This is one of my famous 76B files
~> curl --include https://odkcentral.dbca.wa.gov.au/v1/projects/1/forms/build_Site-Visit-Start-0-3_1559789550/submissions/uuid:fcf3d82a-3276-44d9-9b36-5f43ac460692/attachments/ -u EMAIL -p
Enter host password for user 'EMAIL':
HTTP/2 200
server: nginx
date: Mon, 08 Mar 2021 09:56:27 GMT
content-type: application/json; charset=utf-8
content-length: 45
x-powered-by: Express
etag: W/"2d-HVcqTtBpy7VYRhG7AXUxWyO/jck"
strict-transport-security: max-age=31536000
x-content-type-options: nosniff
x-ua-compatible: chrome=1
strict-transport-security: max-age=63072000
[{"name":"1597722025465.jpg","exists":false}]
# The riddle's solution: this file has a filename (a photo was taken in ODK Collect,
# but the file does not exist as far as ODK Central is concerned. Was this an upload error between ODK Collect and ODK Central?
The main branch of ruODK is now robust against missing attachment files without the overhead of the extra API call to test for the attachments' existence.
Hi @florianm,
thanks for all your help with this! (and sorry to be so slow to respond). I downloaded the zip file from ODK central and couldn't find any evidence of corrupted image files or files that weren't valid images. I have just reinstalled THE latest version of ruODK from Github and reinstalled in R and things seem to re downloading now without any issue, so I think the fixes that you have applied have resolved my problem. This is a really wonderful package and I am very grateful for the support!
Regards,
Dan
Aw man, great to hear! Thanks again for the bug report, it reminded to fix attachment downloads from nested sub-tables. I'll close this for now, feel free to re-open the issue if you find ruODK misses any attachments.
Problem
We have an ODK Central Server up and running. One of the ODK forms contains a field to upload a photograph of a study site. ruODK works perfectly for us to pull our ODK Central Data into R, with one exception: downloading the attached images. When using ruODK to pull all of the data for the form with the images, it successfully starts downloading images, but then eventually, after approximately a couple of hundred images downloaded, it seems to time out and the R console returns the error:
..... ✔ File saved to "../www/images/1608611394304.jpg". Request failed [404]. Retrying in 1 seconds... Error: Problem with
mutate()
inputtrap_photo
. x Not Found (HTTP 404). Failed to get desired response from server https://myserver.com as user "myusername".Reproducible example
Unfortunately, I can't share my server address, username and password to reproduce the error, but hopefully the code below provides some insight into how the data is being extracted. If I set "download = FALSE" in odata_submission_get then all the forms minus the images download fine.
Session Info
R version 3.6.1 Patched (2019-08-07 r76935) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.16 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] SDraw_2.1.13 rgdal_1.4-4 sp_1.4-4 forcats_0.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 [10] ggplot2_3.3.2 tidyverse_1.3.0 stringr_1.4.0 ruODK_0.9.6 MASS_7.3-51.5 loaded via a namespace (and not attached): [1] nlme_3.1-140 fs_1.5.0 sf_0.9-6 lubridate_1.7.9.2 RColorBrewer_1.1-2 httr_1.4.2 tools_3.6.1 [8] backports_1.1.10 R6_2.5.0 AlgDesign_1.2.0 rpart_4.1-15 KernSmooth_2.23-15 rgeos_0.5-1 Hmisc_4.2-0 [15] DBI_1.1.0 colorspace_2.0-0 nnet_7.3-12 withr_2.3.0 tidyselect_1.1.0 gridExtra_2.3 curl_4.3 [22] compiler_3.6.1 cli_2.2.0 rvest_0.3.5 htmlTable_1.13.1 xml2_1.3.2 spsurvey_4.1.4 keras_2.2.5.0 [29] scales_1.1.1 checkmate_1.9.4 classInt_0.4-3 tfruns_1.4 crossdes_1.1-1 digest_0.6.27 foreign_0.8-71 [36] base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.5.0.9003 dbplyr_1.4.2 htmlwidgets_1.5.3 rlang_0.4.9 readxl_1.3.1 [43] rstudioapi_0.13 generics_0.1.0 jsonlite_1.7.2 gtools_3.8.1 tensorflow_2.0.0 acepack_1.4.1 magrittr_2.0.1 [50] Formula_1.2-3 Matrix_1.2-17 Rcpp_1.0.5 munsell_0.5.0 fansi_0.4.1 reticulate_1.13 lifecycle_0.2.0 [57] stringi_1.5.3 whisker_0.4 snakecase_0.11.0 grid_3.6.1 parallel_3.6.1 crayon_1.3.4 deldir_0.1-23 [64] lattice_0.20-38 haven_2.2.0 splines_3.6.1 hms_0.5.3 zeallot_0.1.0 knitr_1.30 pillar_1.4.7 [71] clisymbols_1.2.0 reprex_0.3.0 glue_1.4.2 latticeExtra_0.6-28 data.table_1.12.8 modelr_0.1.5 vctrs_0.3.5 [78] cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 xfun_0.19 janitor_2.0.1 broom_0.5.4 e1071_1.7-4 [85] class_7.3-15 survival_2.44-1.1 units_0.6-7 cluster_2.1.0 ellipsis_0.3.1 ```{r} # utils::sessionInfo() ```Thanks you for providing this excellent package and thanks in advance for any insights into what might be causing this issue.