ouhft / COPE

Project Repository for Work Package 4 of the COPE Transplant Trial
https://cope.nds.ox.ac.uk
1 stars 0 forks source link

Investigate document labels #302

Open marshalc opened 6 years ago

marshalc commented 6 years ago

Barcodes, QR Codes, means of identifying sheets of paper to the system, and ideally provide some feedback for humans to read.

marshalc commented 6 years ago

This topic (and the background for it) isn't quite as robust or trivial as I would like, so needs exploring. Here's a summary of things so far:

There's two domains for data management, and we want a way to robustly link the two so that it is easy(ier) to audit and track what has happened to the data. Those domains are Paper, and Digital.

Versioning

...is a thing that happens on both sides.

Digitally it is handled by keeping a copy of the data prior to it being overwritten by newer data for each record in the database tables. This means that each digital record has both a primary key, and a version ID.

Paperwise, there's a mix of things happening. Original forms can have notes or alterations scribbled on them, and then there can be new forms filled out as part of the data cleanup process that introduce yet more versions of the same data/topic. There are no ways to identify when a document was written, nor any agreed ways of indexing or identifying them. @AllyBradley has been attempting to group and file these documents as they're discovered, and to file them under Trial IDs.

Linkage

Ideally we want to have a paper "record" that matches a digital record, thus we can then say that that paper record was correctly recorded (and then digitised itself so that we can easily reference it).

However this isn't going to be as clean as we would like, especially given this is being retrospectively applied to the setup. Identifying the historical digital record that matches a given paper record is going to be tricky and mostly informed guesswork. Linking new paperwork could be a bit cleaner, but it still relies on a human ensuring they have correctly transcribed the paper content into the digital equivalent - and that process is prone to error (though with later digitising the paper record, it should be possible for a second human to check and validate).

Precedence

This is the proverbial chicken-and-egg problem of which came first (the paper or the digital) and which should take primacy (the answer to that should be digital, but given the project's switch back and forth between a paper-first process and a digital first one, this is not so clear or consistent).

I don't think an arbitrary decision can be made on this now because of the project's changes, and thus the cataloging of paper records against digital ones needs to allow for both options (i.e. a paper record before a digital one, and vice versa), and thus the linkage is only ever going to be approximate and the process for digitisation of the paper records will need to be flexible; and this flexibility will reduce the robustness of the process somewhat because exceptions will need to be made.

Paper Forms <> Digital Objects

The digital structure of the trial data is based on the object analysis of the data space, and not based on the duplicated-data structure of the paper forms. One Procurement Form is digitally represented by (approximately):

Obviously the digital system only updates the records that are affected by a change in the online form that relates to their data, so it's possible to have a range of version numbers that relate to the "current" digital form view.

The question therefore becomes one of do we try to link to a single Donor record with a paper equivalent? The Donor record doesn't record the version each linked record that correspond to it, so being able to determine the "set" that comprises an equivalent paper form is nigh on impossible without some educated guesswork of looking at the modified dates for each linked record. This shouldn't come as a surprise, given the simple reversion solution used to record changes was intended primarily as an audit history for data changes, not a referencing system for record collections.

Alternatively we have to conceive creating a "record set" object that can hold the full Foreign Key references and reVersion IDs... but how effective, or necessary is this?

Paper forms have changed over time

The Paper forms have been modified by various parties over the course of the trial without any referencing to the digital system, thus we have paper records with missing information, answers to alternative questions, or extra information that isn't expected to be captured.

The Digital Forms have also changed, but to a much lesser degree, over time - and since these two entities were not managed and linked as processes within the trial, this means there's no easy matching between pieces of paper, and "record sets".