kobotoolbox / enketo-express

We've moved! Please use the new repository 🠊 https://github.com/enketo/enketo-express
Apache License 2.0
102 stars 90 forks source link

Submission is incomplete when inserting larger photos #776

Closed pf-informatik closed 6 years ago

pf-informatik commented 7 years ago

I've set up a form with the latest kobo-docker (Updated all docker images this week). The form has about 40 questions, some of them are image-fields. Depending on other answers, the image-fields are skipped. If they are displayed, they are mandatory One answer is a repeated-group, where users can upload 0 or more "additional images".

The form can be filled out without seeing any mandatory image fields, usually we have 4 images and no "additional images".

When testing with desktop-browsers and mobile devices connected via WiFi everything was fine. Now we rolled out the app to our users and experience large problems. When submitting more than one image and these images are rather large, the form is sent to the server and all (text) data is present. The image names are part of the record, but some of the images are missing (looking inside the attachments-folder on the hard disk, only the first images are there, the last ones are missing). On a Galaxy Note 1 we have some submissions with 7MB of images working, in other submissions only 4MB arrived on the server. On an iPad 2 the Limit seems to be 5MB. Here we can reproduce clearly, that the submission only contains those images that "fit" into 5MB when using safari. With firefox on same device we can send 8MB.

Our users are driving to different locations (in germany) with their smart phones always on using mobile data. So we have varying connection speeds between "offline" and 50MBit LTE.

I tried to investigate a little further: A transmission of one single image is aborted after exactly 5minutes. I read a lot of explanation of enketo express, which says, that the upload is chunked. This does not seem to be the case in our setup. A submission with a phone with a throttled bandwidth but constant internet connection retries the upload, but never finishes. So I raised the "timeout" from 5 minutes to 50 minutes (added values in enketo_express/config.json). Now even the larger files are uploaded with a slow connection. I thought I was done with it but on the next day the users in the field still experience the same problems!

Now we told the users to reduce their camera resolution. The images are now <1MB, the form data does not exceed 3MB. Everything works perfectly!

But some of the new users have really modern phones with terapixel cameras. They told me they cannot reduce the resolution that far.

As enketo express uses indexedDB for offline storage there should be not limitation on submitted or queued records. But it feeled like the varying limits of localStorage are active here.

Do you have any clue what can be done to ensure all images are submitted? Do you have any workaround in mind? (At the moment some users take the photos directly with the camera app and resize them with another app. When filling out the form they select those images. Others users just send me all their images at the end of the day and I edit all of their submissions from my desktop afterwards)

The system is a dedicated server (Ubuntu 16.04.2 LTS) with 2 cpu cores, 6GB RAM and more than 30GB of free SSD Storage. The kobo-docker containers are the only active applications on this server. We use the docker-compose.server.yml setup with valid SSL-Certificates (LetsEnrypt)

Regards, Peter

MartijnR commented 7 years ago

Hi Peter,

Thanks for this report and for all the troubleshooting you've done.

I believe the limits for indexedDb may depend on the available (or total?) storage on the device. Whatever, the rules are, they are probably not consistently implemented across browsers (in the past, I have found that particularly exotic clones such as Samsung's 'Internet' don't play by the rules). We cannot rule out that browser storage indeed may be the remaining issue you are experiencing. If you are ever able to reproduce this yourself on a particular device, then using Enketo's 'export' functionality could be helpful. There should be metadata indicating which files Enketo failed to retrieve from the database.

Submissions are sort-of chunked, but that doesn't actually happen for individual image files. A record with multiple images, may be split up into multiple submission batches each containing a few image files, but a single image file is never split up. The server (KoBo) indicates maximum size (per batch) that is allowed and Enketo will not allow a user to add a file larger than this maximum. With respect to image size (if not indexedDb) the data connection is probably the most likely issue (even with the large timeout). However, Enketo is designed to consider the whole record (with all its images) as having failed to submit if one of the batches did not return a 201 response. So Enketo would keep the record in the browser queue (but the server may record the incomplete record). Do you know if that is happening?

A third option is that the server fails to handle the submission, but still returns a 201 response. This has happened in the past (with ODK Aggregate) so would be worthy of investigation perhaps.

I am a little worried this issue may be too hard to reproduce to allow fixing it. It provides a good incentive to implement a new feature we have been debating in the XForms spec: https://github.com/opendatakit/xforms-spec/issues/79.

MartijnR commented 7 years ago

A scenario with the 2nd option:

  1. Submission of record with image a.jpg and image b.jpg is divided into 2 batches (each batch always contains the XML record).
  2. Batch 1 with a.jpg arrives successfully, KoBo returns 201 response.
  3. Batch 2 with b.jpg fails due to connection issues, there is no response or an error response. Enketo keeps entire record in the queue and KoBo stores record with a.jpg. (? just a guess, I don't know)
  4. Enketo makes another attempt to submit the whole record some time later (again in two batches)
  5. KoBo has already received a record with that instanceID so it provides success responses (201 or 202) for both batches.
  6. Enketo deletes the record from the browser queue as all seems good.
  7. Maybe KoBo didn't update the earlier record with b.jpg?

@dorey, is this a possibility or would KoBo a) not store the incomplete record or b) update the record during the 2nd attempt?

pf-informatik commented 7 years ago

So Enketo would keep the record in the browser queue (but the server may record the incomplete record). Do you know if that is happening?

I am sure, this is not the case. The record is put in the queue, the submission is reported as sent, on the server side I can see, edit, delete the record, the record is part of the statistics, the raw record in the database is ok (logger_instance AND logger_attachment) and so I see the names of the images. Everything seems fine. It's just the fact that the file is physically missing.

Your 2nd scenario seems to hit the point. I'm not 100% sure, but as I am reading your explantion I remember my tests with aborted connections during submit.

  1. filled submission with 4 images 2MB each
  2. put the record in the queue
  3. let the upload do approx. 4.5Mbytes (monitoring with fiddler)
  4. interrupt the connection completely
  5. wait 1 minute, the re-establish connection
  6. trigger upload in enketo-express
  7. the upload finishes almost immediatly and was reported as successful

In this scenario just the first two images can be found on the server.

I blamed fiddler for this behaviour (fiddler is a http sniffer that digs deep into the network traffic, even the ssl data is decrypted), I thougt the ssl reconnect broke up the transmission.

But it is as you describe: The first batch/image ist transmitted successfully and sets the 201. The second image fails somehow but this does not mark the record "incomplete". Another point confirming this assumption: I cannot produce an incomplete record with only one file. Even if that one file ist very large (15MB) the submission has either failed completely or succeeded with the large image. You need two or more images.

It seems that the succesfull transmission of one file makes the submission "complete" no matter if any subsequent image transmission fails.

MartijnR commented 7 years ago

Yes, the reason I think the scenario I described is possible, is that there is actually no way for KoBo to know how many batches are coming (without some very complex mechanism which I'm sure is not happening). There isn't such a thing in the spec. So if a multi-batch submission is made, any failed batch after a successful batch would likely lead to this incomplete record state.

However, I see that the maximum submission size that kc.kobotoolbox.org is publishing as acceptable is 93,750,000 bytes. This should pretty much prevent splitting up of submission into batches until the combined size of the images exceeds 89 Mb. Is it possible your users are exceeding that?

MartijnR commented 6 years ago

From all the kobocat commit refs, I'm guessing this is resolved on the KoBoCAT side. Please re-open if Enketo changes are required.