mountetna / magma

Data server with friendly data loaders
GNU General Public License v2.0
5 stars 2 forks source link

Date-shift on insertion into date_time attributes #117

Open graft opened 4 years ago

graft commented 4 years ago

Date-Time attributes in Magma are potentially confounding as they may include protected health information (PHI). Really any attribute might include PHI, especially any free-form text field, which will pretty reliably accumulate PHI if they exist.

However, date-times are attractive because (1) they are nearly always guaranteed to be PHI if they related in any way to patient care, including the processing of downstream samples and (2) there is a simple paradigm for solving them.

This paradigm is date-shifting all of the dates associated with a patient backwards by a random number between 1 and 365. In this manner, the dates are somewhat obscured, but still maintain their relative position with respect to the patient. The offset is computed from the patient identifier (roughly hash(identifier + salt) % 365) for consistency. There is an implementation of this in the redcap client in @mountetna/magma_ipi.

There are two ways we can make use of this offsetting: on input and on output.

If we date shift on output, then the input process requires us to submit the unshifted quantities, and Magma is storing PHI. There are facilities for supporting this (attribute-level restriction); currently the behavior of this flag is to censor data (that is, never return data) for restricted fields from restricted users (privileged users can see and update the data as usual). Instead we might allow the date_time attribute to return date-shifted dates to restricted users. This behavior is pretty clear with no obvious pitfalls other than having to hold PHI.

Meanwhile if we date shift on input, Magma holds no PHI and data is returned as normal; it would simply be transformed before insertion. In this case the 'restricted' flag on the attribute is irrelevant; Magma has no restricted data to share. However, updating becomes complicated by the fact that Magma no longer knows the true values for this data, and since it has shifted on insertion, it cannot successfully re-insert the same data. That is, if I insert the date 1776-07-04 and it gets shifted back to 1776-01-03, now Magma has no idea that the original date was 1776-07-04. The user can retrieve records from Magma and they will say 1776-01-03; if the user attempts to re-insert this value into Magma, Magma might re-shift it yet again, so the date is offset twice (back to 1775-07-02 or something); we would lose the consistency of the offset across the patient. It's not clear how to resolve this difficulty.

One possibility is to specify date_shift as an option to /update, so the editor can decide whether they want to shift dates on insertion. The editor may use this flag on the initial insertion of PHI, but not on subsequent edits to the date_time value, allowing them to shift the date at first, and then modify the shifted date later.