hivdb / covid-drdb

MIT License
2 stars 0 forks source link

Refactor `subject_history` #11

Closed philiptzou closed 2 years ago

philiptzou commented 3 years ago

This issue is derived from #9.

In general, subject_history includes patient's history records across infection, plasma isolation, virus isolation and vaccination. Those events shared attributes such as event_date, however they are mostly consisted by event-specific fields, such as vaccine_name to vaccination, or iso_name to isolation. In addition, patient's severity shouldn't attach to any of the events and should be divided into an individual table too. Here I propose to divide the subject_history into five tables + subject_treatments created by pull request #9.

My goal is to have the tables redesigned here and using a script to automatically migrate our current data.

subject_infections

Table for infection events. The new design allows to record re-infection events.

Field Type Description
ref_name string Reference name, pkey
subject_name string Subject name, pkey
infection_date_cmp date_cmp_enum Date-cmp of infection
infection_date date Date of infection, pkey
infected_var_name string Variant name (not isolate name, since the virus usually not sequenced when infected)
location string Location where the infection happened
section string Source of this data in paper

subject_plasma

Table for plasma isolation events. This table should also replace the main table rx_plasma.

Field Type Description
ref_name string Reference name, pkey
subject_name string Subject name, where the collected_rx_name collected from, pkey
rx_name string Plasma name (a plasma is a rx because it can be used for treatment)
collection_date_cmp date_cmp_enum Date-cmp of collection
collection_date date Date of collection, pkey
location string Location where the plasma was collected
cumulative_group string Deprecated, kept for compatible reason
section string Source of this data in paper

subject_isolates

Table for virus isolation events.

Field Type Description
ref_name string Reference name, pkey
subject_name string Subject name, pkey
collection_date_cmp date_cmp_enum Date-cmp of collection
collection_date date Date of collection, pkey
iso_name string Isolate collected
iso_source string Body source from which the isolate was obtained, should constrained by a lookup table, pkey
iso_culture boolean Whether the isolate was cultured before sequencing, pkey
location string Location where the virus isolate was collected
section string Source of this data in paper

subject_vaccines

Table for vaccination events. Allows infection of different vaccine on the same day.

Field Type Description
ref_name string Reference name, pkey
subject_name string Subject name, pkey
vaccination_date_cmp date_cmp_enum Date-cmp of vaccination
vaccination_date date Date of vaccination, pkey
vaccine_name string Vaccine name, pkey
dosage int Vaccine dosage
location string Location where the person received vaccine
section string Source of this data in paper

subject_severity

Table for patient's severity. Support different severity of multiple time periods. A pre-import constraint should be apply to ensure there's no overlap between records.

Field Type Description
ref_name string Reference name, pkey
subject_name string Subject name, pkey
start_date_cmp date_cmp_enum Date-cmp of date range start
start_date date Date range start, pkey
end_date_cmp date_cmp_enum Date-cmp of date range end
end_date date Date range end, pkey
severity severity_enum Subject severity during this time period
section string Source of this data in paper

subject_treatments

Table for treatments received by the patient.

Field Type Description
ref_name string Reference name, pkey
subject_name string Subject name, the recipient of the rx_name, pkey
rx_name string Treatment name, can be any CP, VP, MAb and event unclassified Rx, pkey
start_date_cmp date_cmp_enum Date-cmp of date range start
start_date date Date range start, pkey
end_date_cmp date_cmp_enum Date-cmp of date range end
end_date date Date range end, pkey
dosage int Dosage of this treatment
dosage_unit dosage_unit_enum Unit of the dosage
section string Source of this data in paper
philiptzou commented 2 years ago

This issue is resolved by a thread of commits end at ec6dc3f247033c5cfed8af3125ef211854d1e4c9 and hivdb/covid-drdb-payload@da92133dda83115260d7c6cb6378fff52ec74ddf.