kobotoolbox / kobocat

Our (backend) server for providing blank forms to Collect and Enketo and for receiving and storing submissions.
BSD 2-Clause "Simplified" License
117 stars 127 forks source link

Remove formatting restrictions on `<instanceID>` (submission UUID) strings #861

Closed jnm closed 3 months ago

jnm commented 1 year ago

The OpenRosa spec allows people to implement a "custom ID scheme" for <instanceID>, although it "must be a universally unique string identifying this specific submission".

We don't currently support this: https://github.com/kobotoolbox/kobocat/blob/a8fbca5b0f28491de1b4d732bc6ed2339cab0dd6/onadata/apps/logger/xform_instance_parser.py#L72-L93

Some people have uploaded submissions containing the likes of

<meta><instanceID>RYAPHKJBDOZJWQ2W5BDXQ0MLU</instanceID>…

This results in the logger_instance.uuid column diverging from the <instanceID> in the XML.

We should:

  1. Enforce UUID uniqueness across the entire logger_instance table;
  2. After that, remove our formatting restrictions on the <instanceID> string
jnm commented 3 months ago

We could enforce xml_hash uniqueness before removing the formatting restrictions, but we could not (I was mistaken) enforce UUID uniqueness while the formatting restrictions are in place. That's because we need to identify cases where UUIDs collide but XML content is not identical, and then rewrite their UUIDs to append something like "DUPLICATE n" (which would violate the existing formatting restrictions).

Some internal discussion at https://chat.kobotoolbox.org/#narrow/stream/4-Kobo-Dev/topic/Duplicated.20submission.20UUIDs/near/192516

noliveleger commented 3 months ago

Closed because it is gonna be handled by kobotoolbox/kpi#5047