ouhft / COPE

Project Repository for Work Package 4 of the COPE Transplant Trial
https://cope.nds.ox.ac.uk
1 stars 0 forks source link

[DMC Updates] Create Import function for AB paper check spreadsheet #311

Closed marshalc closed 5 years ago

marshalc commented 6 years ago

To speed up the logging of paper records, write an import function that takes AB's excel files and maps them to the relevant FU forms

This will attempt to set #310 and #312 fields

marshalc commented 6 years ago

Data files for BE and NL have been received and cleaned (i.e. the varied column arrangements have been standardised and given consistent names, the redundant data has been scrubbed from the files). Copies of these cleaned files have been emailed to AB, SK, and LD.

marshalc commented 6 years ago

Can do the import direct into sqlite more quickly than any other means:

.mode csv temp_paper_record_import
.import /Users/carl/Documents/NDS/COPE/PaperAudit/NL_BE_cleaned_combined.csv temp_paper_record_import

/* From the SQL Command-line: Clean/normalise the data */
UPDATE temp_paper_record_import
SET t_form = LOWER(t_form),
  fu_1 = LOWER(fu_1),
  fu_2 = LOWER(fu_2),
  fu_3 = LOWER(fu_3),
  fu_4 = LOWER(fu_4),
  trial_id = UPPER(trial_id);

/* Need to have the organ_id for easy mapping with FU and TForms , so add column and add in reference */
ALTER TABLE temp_paper_record_import ADD organ_id INT NULL;

UPDATE temp_paper_record_import
SET organ_id = (
  SELECT compare_organ.id
  FROM compare_organ
  WHERE compare_organ.trial_id = temp_paper_record_import.trial_id
)
WHERE EXISTS(SELECT * FROM compare_organ WHERE compare_organ.trial_id = temp_paper_record_import.trial_id);
marshalc commented 6 years ago

The updating of the data can be done via some short python scripts that need only be run the once, but they'll have to wait on #310 before actioning. (edit: not 313, but 310, and not dependant on 312 either.)

marshalc commented 6 years ago

310 Done. Looking at scripts to do the data migration.

marshalc commented 6 years ago

Data clean up needed before this will all match. Problems with two pairs of organs identified with the same trial ID, and both pairs now appear to have been edited with differing data, and more worryingly, different randomisations! (WP451023). Need to dig out all the data on them to work out what the options for clean up are.

Organ IDs 273 & 274 + 459 + 460

marshalc commented 6 years ago

459 & 460 have some audit history - 459 the most. Latest edits (in Dec when this data was copied) were by @aukjebrat in April 2017. There's no audit history showing for 273 and 274.

From the import data, this could well be a moot point, as there is no paper records for this trial id. In which case, the best option is to take the first create pair (273 & 274) and rename those with a new unique trial ID. This would then match there being an audit history for the WP451023 trial id and there being a spreadsheet reference to database entry, whereas there's nothing else of note about the first pair.

marshalc commented 6 years ago

Donor 137 is a very odd record, with an apparent sequence number of 23 (in conflict with donor 230); whereas Donor 138 has a sequence number of 10, and the same retrieval team. Donor 132 has the sequence number of 9 for that team... so 137 should be like 9 3/4qtrs!?

Also, the randomisation allocated_on date for 137 is the same date as for donor 138... which seems unlikely that the same team did two retrievals in one day?

marshalc commented 6 years ago

Email sent to @aukjebrat, SK and LD to try an help unravel this duplicate ID problem and confirm I can fix this by applying the renaming solution of:

UPDATE compare_donor SET trial_id = 'WP451025', sequence_number = '25' WHERE id = 137;
UPDATE compare_organ SET trial_id = 'WP451025L' WHERE id = 273;
UPDATE compare_organ SET trial_id = 'WP451025R' WHERE id = 274;
marshalc commented 6 years ago

The field updates can be done with:

/*
######  Update T-Forms from paper record import  ######
x = paper form, or True;
database = database entry, or False;
austria = redundant data, so treat as database;
missing = semi redundant data, but we treat this as blank, and thus None.
 */
UPDATE compare_organ
SET paper_form_was_the_source = (
  Select CASE t_form
      WHEN 'x' THEN 1
      WHEN 'database' THEN 0
      WHEN 'austria' THEN 0
      ELSE NULL
    END
  FROM temp_paper_record_import
  WHERE temp_paper_record_import.organ_id = compare_organ.id
)
WHERE EXISTS(SELECT * FROM temp_paper_record_import WHERE temp_paper_record_import.organ_id = compare_organ.id);

/*
######  Update FU Forms from paper record import  ######
x = paper form, or True;
database = database entry, or False;
half = redundant data, so treat as database;
missing = semi redundant data, but we treat this as blank, and thus None.
n/a = semi redundant data, but we treat this as blank, and thus None.
death = semi redundant data, but we treat this as blank, and thus None.
patient died = semi redundant data, but we treat this as blank, and thus None.
not due = semi redundant data, but we treat this as blank, and thus None.
blank = no answer, so None
*/
UPDATE followups_followupinitial
SET paper_form_was_the_source = (
  Select CASE fu_1
      WHEN 'x' THEN 1
      WHEN 'database' THEN 0
      WHEN 'half' THEN 0
      ELSE NULL
    END
  FROM temp_paper_record_import
  WHERE temp_paper_record_import.organ_id = followups_followupinitial.organ_id
)
WHERE EXISTS(
    SELECT *
    FROM temp_paper_record_import
    WHERE temp_paper_record_import.organ_id = followups_followupinitial.organ_id
);

UPDATE followups_followup3m
SET paper_form_was_the_source = (
  Select CASE fu_2
      WHEN 'x' THEN 1
      WHEN 'database' THEN 0
      WHEN 'half' THEN 0
      ELSE NULL
    END
  FROM temp_paper_record_import
  WHERE temp_paper_record_import.organ_id = followups_followup3m.organ_id
)
WHERE EXISTS(
    SELECT *
    FROM temp_paper_record_import
    WHERE temp_paper_record_import.organ_id = followups_followup3m.organ_id
);

UPDATE followups_followup6m
SET paper_form_was_the_source = (
  Select CASE fu_3
      WHEN 'x' THEN 1
      WHEN 'database' THEN 0
      WHEN 'half' THEN 0
      ELSE NULL
    END
  FROM temp_paper_record_import
  WHERE temp_paper_record_import.organ_id = followups_followup6m.organ_id
)
WHERE EXISTS(
    SELECT *
    FROM temp_paper_record_import
    WHERE temp_paper_record_import.organ_id = followups_followup6m.organ_id
);

UPDATE followups_followup1y
SET paper_form_was_the_source = (
  Select CASE fu_4
      WHEN 'x' THEN 1
      WHEN 'database' THEN 0
      WHEN 'half' THEN 0
      ELSE NULL
    END
  FROM temp_paper_record_import
  WHERE temp_paper_record_import.organ_id = followups_followup1y.organ_id
)
WHERE EXISTS(
    SELECT *
    FROM temp_paper_record_import
    WHERE temp_paper_record_import.organ_id = followups_followup1y.organ_id
);
marshalc commented 6 years ago

So, once the data clean up of the duplicate WP451023 is confirmed, I can apply the import and then updates as above on the production data.

marshalc commented 6 years ago

Running the import scripts on the production server resulted in some strange behaviour, the workaround for which was to download the database file, apply the changes on the laptop, and then copy the file back again.

It was saying that the column trial_id could not be found when doing any query, whereas the schema and a select all was showing the data and the correct column name. All the other columns could be referenced just fine. Can't find a simple answer, so ignoring for now.

marshalc commented 6 years ago

The rest of the SQL commands appear to have applied just fine though on the live server after the django migration was performed.

marshalc commented 5 years ago

This looks to be ready for closing.