outcomesinsights / generalized_data_model

Outcomes Insights' Data Model for Clinical Research
MIT License
16 stars 3 forks source link

Add SES data from SEER Medicare #123

Closed jenniferduryea closed 5 years ago

jenniferduryea commented 5 years ago

Originally from MarksJigsawIssues.xlsx

We need a way to add SES - socioeconomic - information from SEER Medicare. Possibly add a table since some of this information can have related dates.

Tagging all for discussion.

markdanese commented 5 years ago

This is an important one to fix, so it should be a high priority. The main issue is that some data sources (most) will treat patient characteristics as fixed. Others like SEER will have multiple values per person over time. So, it would be good to allow dates to be present or not, depending on the data source (and also multiple values per person). The trick will be figuring out how to incorporate dates or to allow missing dates.

marc-outins commented 5 years ago

We can just add start and end dates to patient_details and allow missing for both. Or we can make start date required and use birth date if there is no date and then allow end date to be null.

markdanese commented 5 years ago

We probably also need to figure out how the protocol builder will be used to bring in the information we need. Particularly with regard to how it works with maximum lookback settings.

marc-outins commented 5 years ago

seer_2018_breast_xgeva_za on titan now has the updated patient_details table which now has the following columns:

id patient_id start_date end_date value_as_number value_as_string value_as_concept_id patient_detail_concept_id patient_detail_source_value patient_detail_vocabulary_id

jenniferduryea commented 5 years ago

@marc-outins is this done? Looks like specs were updated under https://github.com/outcomesinsights/generalized_data_model. If so, please close ticket.

marc-outins commented 5 years ago

@jenniferduryea this is ready to test in the dataset seer_2018_breast_xgeva_za on titan. I still need to update gdm_250_extended to include patient_detail records

jenniferduryea commented 5 years ago

confirmed gdm data is using patient_details table and have start_dates and end_dates. All values have a date assigned to them. Current standard is to assign the patient's birthdate to the patient_details.start_date for "timeless" variables (such as sex, race, etc). All other variables are assigned a date as to the date of the data cut (start) that the data is referenced. In particular, vocabulary GDM_CENSUS_TRACT_MEASURE will have multiple values at each year (if applicable to the patient). Looks good.