jacob-umich / stats-dash

class project for stats 507. we create a web application to view data
MIT License
0 stars 0 forks source link

Picking and cleaning datasets #2

Closed jacob-umich closed 6 months ago

jacob-umich commented 7 months ago

Background

For this project, we need to use the chronic disease dataset and augment it with another dataset. The two datasets must be somewhat related for them to be used together. This task will involve choosing the data set to augment the chronic disease data set and writing scripts to clean both data sets. This work will be completed in a branch called "eda"

Tasks

jacob-umich commented 7 months ago

There are 34 separate fields in the chronic data set:

Many of them have vague meanings, but it seems like each entry is a question asked to a specific individual like "Diabetes among adults". The features whos meaning we would need to identify in this case would be anything to do with stratification or datavalue

jacob-umich commented 7 months ago

When displaying the entries that do not have NAN in the "response" column, an empty data frame is returned. There are 109 different questions asked. The most frequent question is "Binge drinking frequency among adults who binge drink" at 5720 instances. Many answers seem like they should be true or false, such as "Food insecure in the past 12 months among households" or "No broadband internet subscription among households" or it could be the number of yes answers from an aggregated survey. Because of the year start and year end features, it might be that the answer is the number of incidents in that given time period

jacob-umich commented 7 months ago

This table is only the entries that had the question "Food insecure in the past 12 months among households". So, it looks like all the entries are aggregated answers for questions asked/ data collected over a period of time limited to a certain geographic area (states). Based on this, we should choose a dataset that relates to one of the questions asked at least.

YearStart YearEnd LocationAbbr DataValueUnit DataValueType DataValue
25076 2019 2021 AK % Crude Prevalence 9.5
18425 2019 2021 AL % Crude Prevalence 13.1
30716 2019 2021 AR % Crude Prevalence 15
18702 2019 2021 AZ % Crude Prevalence 10.1
26942 2019 2021 CA % Crude Prevalence 9.6
30776 2019 2021 CO % Crude Prevalence 10.5
28062 2019 2021 CT % Crude Prevalence 9.6
29477 2019 2021 DC % Crude Prevalence 9
38811 2019 2021 DE % Crude Prevalence 11.2
31599 2019 2021 FL % Crude Prevalence 9.9
32639 2019 2021 GA % Crude Prevalence 9.9
31072 2019 2021 GU % Crude Prevalence nan
51521 2019 2021 HI % Crude Prevalence 9.1
45921 2019 2021 IA % Crude Prevalence 7
44209 2019 2021 ID % Crude Prevalence 9.8
43543 2019 2021 IL % Crude Prevalence 9.4
50978 2019 2021 IN % Crude Prevalence 9.7
46560 2019 2021 KS % Crude Prevalence 10.2
40045 2019 2021 KY % Crude Prevalence 12.3
42348 2019 2021 LA % Crude Prevalence 14.5
65876 2019 2021 MA % Crude Prevalence 8.4
60423 2019 2021 MD % Crude Prevalence 8.7
59576 2019 2021 ME % Crude Prevalence 9.5
65847 2019 2021 MI % Crude Prevalence 11.4
58773 2019 2021 MN % Crude Prevalence 7.4
57794 2019 2021 MO % Crude Prevalence 12
65500 2019 2021 MS % Crude Prevalence 15.3
53444 2019 2021 MT % Crude Prevalence 10.4
66918 2019 2021 NC % Crude Prevalence 10.9
66152 2019 2021 ND % Crude Prevalence 7.7
77580 2019 2021 NE % Crude Prevalence 10.6
77195 2019 2021 NH % Crude Prevalence 5.4
67001 2019 2021 NJ % Crude Prevalence 8.3
70966 2019 2021 NM % Crude Prevalence 11.5
75368 2019 2021 NV % Crude Prevalence 10.2
70292 2019 2021 NY % Crude Prevalence 10.3
82520 2019 2021 OH % Crude Prevalence 10.8
85652 2019 2021 OK % Crude Prevalence 13.8
87941 2019 2021 OR % Crude Prevalence 10.3
83502 2019 2021 PA % Crude Prevalence 9.2
89665 2019 2021 PR % Crude Prevalence nan
84235 2019 2021 RI % Crude Prevalence 8.4
91723 2019 2021 SC % Crude Prevalence 12.6
88092 2019 2021 SD % Crude Prevalence 8.7
102851 2019 2021 TN % Crude Prevalence 11.2
92585 2019 2021 TX % Crude Prevalence 13.7
99930 2019 2021 US % Crude Prevalence 10.4
104635 2019 2021 UT % Crude Prevalence 11.2
94758 2019 2021 VA % Crude Prevalence 7.8
104227 2019 2021 VI % Crude Prevalence nan
93287 2019 2021 VT % Crude Prevalence 7.9
98754 2019 2021 WA % Crude Prevalence 7.9
105791 2019 2021 WI % Crude Prevalence 9.9
117168 2019 2021 WV % Crude Prevalence 14
109942 2019 2021 WY % Crude Prevalence 11.2
jacob-umich commented 7 months ago

Here are some of the other features

LowConfidenceLimit Stratification1 StratificationCategory1 Geolocation LocationID DataValue
18425 9.4 Overall Overall POINT (-86.63186076199969 32.84057112200048) 1 13.1
18702 7.1 Overall Overall POINT (-111.76381127699972 34.865970280000454) 4 10.1
25076 6.3 Overall Overall POINT (-147.72205903599973 64.84507995700051) 2 9.5
26942 8.6 Overall Overall POINT (-120.99999953799971 37.63864012300047) 6 9.6
28062 6.2 Overall Overall POINT (-72.64984095199964 41.56266102000046) 9 9.6
29477 6.8 Overall Overall POINT (-77.036871 38.907192) 11 9
30716 11.6 Overall Overall POINT (-92.27449074299966 34.74865012400045) 5 15
30776 7.2 Overall Overall POINT (-106.13361092099967 38.843840757000464) 8 10.5
31072 nan Overall Overall POINT (144.793731 13.444304) 66 nan
31599 8.2 Overall Overall POINT (-81.92896053899966 28.932040377000476) 12 9.9
32639 7.5 Overall Overall POINT (-83.62758034599966 32.83968109300048) 13 9.9
38811 7.5 Overall Overall POINT (-75.57774116799965 39.008830667000495) 10 11.2
40045 9.2 Overall Overall POINT (-84.77497104799966 37.645970271000465) 21 12.3
42348 11.7 Overall Overall POINT (-92.44568007099969 31.31266064400046) 22 14.5
43543 7.4 Overall Overall POINT (-88.99771017799969 40.48501028300046) 17 9.4
44209 7.4 Overall Overall POINT (-114.3637300419997 43.682630005000476) 16 9.8
45921 4.2 Overall Overall POINT (-93.81649055599968 42.46940091300047) 19 7
46560 7.7 Overall Overall POINT (-98.20078122699965 38.34774030000045) 20 10.2
50978 7.4 Overall Overall POINT (-86.14996019399968 39.766910452000445) 18 9.7
51521 6.4 Overall Overall POINT (-157.85774940299973 21.304850435000446) 15 9.1
53444 7.8 Overall Overall POINT (-109.42442064499971 47.06652897200047) 30 10.4
57794 8.7 Overall Overall POINT (-92.56630005299968 38.635790776000476) 29 12
58773 4.8 Overall Overall POINT (-94.79420050299967 46.35564873600049) 27 7.4
59576 6.6 Overall Overall POINT (-68.98503133599962 45.254228894000505) 23 9.5
60423 6.1 Overall Overall POINT (-76.60926011099963 39.29058096400047) 24 8.7
65500 11.7 Overall Overall POINT (-89.53803082499968 32.745510099000455) 28 15.3
65847 8.7 Overall Overall POINT (-84.71439026999968 44.6613195430005) 26 11.4
65876 6.3 Overall Overall POINT (-72.08269067499964 42.27687047000046) 25 8.4
66152 5.5 Overall Overall POINT (-100.11842104899966 47.47531977900047) 38 7.7
66918 8.3 Overall Overall POINT (-79.15925046299964 35.466220975000454) 37 10.9
67001 6.1 Overall Overall POINT (-74.27369128799967 40.13057004800049) 34 8.3
70292 8.5 Overall Overall POINT (-75.54397042699964 42.82700103200045) 36 10.3
70966 7.1 Overall Overall POINT (-106.24058098499967 34.52088095200048) 35 11.5
75368 7.7 Overall Overall POINT (-117.07184056399967 39.493240390000494) 32 10.2
77195 3.5 Overall Overall POINT (-71.50036091999965 43.65595011300047) 33 5.4
77580 7.8 Overall Overall POINT (-99.36572062299967 41.6410409880005) 31 10.6
82520 8.6 Overall Overall POINT (-82.40426005599966 40.06021014100048) 39 10.8
83502 7.2 Overall Overall POINT (-77.86070029399963 40.79373015200048) 42 9.2
84235 5.4 Overall Overall POINT (-71.52247031399963 41.70828019300046) 44 8.4
85652 10.2 Overall Overall POINT (-97.52107021399968 35.47203135600046) 40 13.8
87941 7.7 Overall Overall POINT (-120.15503132599969 44.56744942400047) 41 10.3
88092 6.2 Overall Overall POINT (-100.3735306369997 44.353130053000484) 46 8.7
89665 nan Overall Overall POINT (-66.590149 18.220833) 72 nan
91723 9.9 Overall Overall POINT (-81.04537120699968 33.998821303000454) 45 12.6
92585 11.9 Overall Overall POINT (-99.42677020599967 31.827240407000488) 48 13.7
93287 5.8 Overall Overall POINT (-72.51764079099962 43.62538123900049) 50 7.9
94758 5.7 Overall Overall POINT (-78.45789046299967 37.54268067400045) 51 7.8
98754 6.1 Overall Overall POINT (-120.47001078999972 47.52227862900048) 53 7.9
99930 10.1 Overall Overall nan 59 10.4
102851 9 Overall Overall POINT (-85.77449091399967 35.68094058000048) 47 11.2
104227 nan Overall Overall POINT (-64.896335 18.335765) 78 nan
104635 8.5 Overall Overall POINT (-111.58713063499971 39.360700171000474) 49 11.2
105791 6.9 Overall Overall POINT (-89.81637074199966 44.39319117400049) 55 9.9
109942 8.6 Overall Overall POINT (-108.10983035299967 43.23554134300048) 56 11.2
117168 8.9 Overall Overall POINT (-80.71264013499967 38.66551020200046) 54 14
jacob-umich commented 7 months ago

It looks like the stratifications are ways to group the data, like by age or sex

jacob-umich commented 7 months ago
Question Instances
0 Binge drinking frequency among adults who binge drink 5720
1 Binge drinking intensity among adults who binge drink 5680
2 Diabetic ketoacidosis mortality among all people, underlying or contributing cause 5616
3 Diseases of the heart mortality among all people, underlying cause 5616
4 Coronary heart disease mortality among all people, underlying cause 5616
5 Cerebrovascular disease (stroke) mortality among all people, underlying cause 5616
6 Diabetes mortality among all people, underlying or contributing cause 5616
7 Chronic liver disease mortality among all people, underlying cause 5616
8 Asthma mortality among all people, underlying cause 5616
9 Chronic obstructive pulmonary disease mortality among adults aged 45 years and older, underlying or contributing cause 5304
10 Chronic obstructive pulmonary disease mortality among adults aged 45 years and older, underlying cause 5304
11 Diabetes among adults 5060
12 Chronic obstructive pulmonary disease among adults 5060
13 Routine checkup within the past year among adults 5060
14 Depression among adults 5060
15 Recent activity limitation among adults 5060
16 Current smoking among adults with chronic obstructive pulmonary disease 5060
17 Current cigarette smoking among adults 5060
18 2 or more chronic conditions among adults 5060
19 Fair or poor self-rated health status among adults 5060
20 Frequent mental distress among adults 5060
21 Frequent physical distress among adults 5060
22 Binge drinking prevalence among adults 5060
23 Average recent physically unhealthy days among adults 5060
24 Average mentally unhealthy days among adults 5060
25 Obesity among adults 5060
26 No leisure-time physical activity among adults 5060
27 Influenza vaccination among adults 5060
28 Adults with any disability 5060
29 Quit attempts in the past year among adult current smokers 5060
30 Current asthma among adults 4895
31 Influenza vaccination among adults 18�64 who are at increased risk 4840
32 Lack of health insurance among adults aged 18-64 years 4840
33 Pneumococcal vaccination among adults aged 18�64 years who are at increased risk 4840
34 Pneumococcal vaccination among adults aged 65 years and older 4400
35 Arthritis among adults 3795
36 Physical inactivity among adults with arthritis 3795
37 Hospitalization for heart failure as principal diagnosis, Medicare-beneficiaries aged 65 years and older 3744
38 Hospitalization for chronic obstructive pulmonary disease as principal diagnosis, Medicare-beneficiaries aged 65 years and older 3744
39 Hospitalization for chronic obstructive pulmonary disease as any diagnosis, Medicare-beneficiaries aged 65 years and older 3744
40 Invasive cancer (all sites combined), incidence 2544
41 Consumed vegetables less than one time daily among adults 2530
42 Visited dentist or dental clinic in the past year among adults 2530
43 Taking medicine to control high blood pressure among adults with high blood pressure 2530
44 High cholesterol among adults who have been screened 2530
45 Have taken an educational class to learn how to manage arthritis symptoms among adults with arthritis 2530
46 Taking medicine for high cholesterol among adults 2530
47 Short sleep duration among adults 2530
48 Consumed fruit less than one time daily among adults 2530
49 Received health care provider counseling for physical activity among adults with arthritis 2530
50 Provided care for a friend or family member in the past month among adults 2530
51 Provided care for someone with dementia or other cognitive impairment in the past month among adults 2530
52 High blood pressure among adults 2527
53 Breast cancer mortality among all females, underlying cause 2496
54 Cervical cancer mortality among all females, underlying cause 2496
55 Colon and rectum (colorectal) cancer mortality among all people, underlying cause 2496
56 Lung and bronchial cancer mortality among all people, underlying cause 2496
57 Prostate cancer mortality among all males, underlying cause 2496
58 Invasive cancer (all sites combined) mortality among all people, underlying cause 2496
59 Subjective cognitive decline among adults aged 45 years and older 2422
60 Discussed symptoms of subjective cognitive decline with a health care professional among adults aged 45 years and older with subjective cognitive decline 2422
61 No teeth lost among adults aged 18-64 years 2420
62 Severe joint pain among adults with arthritis 2420
63 Work limitation due to arthritis among adults aged 18-64 years with arthritis 2420
64 Activity limitation due to arthritis among adults with arthritis 2420
65 All teeth lost among adults aged 65 years and older 2200
66 Colorectal cancer screening among adults aged 45-75 years 2200
67 Six or more teeth lost among adults aged 65 years and older 2200
68 Mammography use among women aged 50-74 years 1758
69 Binge drinking prevalence among high school students 1540
70 Current electronic vapor product use among high school students 1540
71 Consumed regular soda at least one time daily among high school students 1540
72 Current smokeless tobacco use among high school students 1540
73 Consumed fruit less than one time daily among high school students 1540
74 Current tobacco use of any tobacco product among high school students 1540
75 Short sleep duration among high school students 1540
76 Obesity among high school students 1540
77 Consumed vegetables less than one time daily among high school students 1540
78 Met aerobic physical activity guideline among high school students 1540
79 Alcohol use among high school students 1540
80 Receipt of evidence-based preventive dental services in the past 12 months among children and adolescents aged 1-17 years 1430
81 Visited dentist or other oral health care provider in the past 12 months among children and adolescents aged 1-17 years 1430
82 Unable to pay mortgage, rent, or utility bills in the past 12 months among adults 1265
83 Met aerobic physical activity guideline for substantial health benefits, adults 1265
84 Lack of social and emotional support needed among adults 1265
85 Lack of reliable transportation in the past 12 months among adults 1265
86 Short sleep duration among children aged 4 months to 14 years 1248
87 Children and adolescents aged 6-13 years meeting aerobic physical activity guideline 1248
88 Unemployment rate among people 16 years and older in the labor force 1040
89 Living below 150% of the poverty threshold among all people 1040
90 High school completion among adults aged 18-24 1040
91 Cigarette smoking during pregnancy among women with a recent live birth 1026
92 Preventive dental care in the 12 months before pregnancy among women with a recent live birth 1026
93 Postpartum depressive symptoms among women with a recent live birth 1026
94 Postpartum checkup among women with a recent live birth 1026
95 Gestational diabetes among women with a recent live birth 1026
96 Health insurance coverage after pregnancy among women with a recent live birth 1026
97 Health insurance coverage in the month before pregnancy among women with a recent live birth 1026
98 Cervical cancer screening among women aged 21-65 years 880
99 Current poor mental health among high school students 770
100 Obesity among WIC children aged 2 to 4 years 432
101 Life expectancy at birth 312
102 Proportion of the population protected by a comprehensive smoke-free policy prohibiting smoking in all indoor areas of workplaces and public places, including restaurants and bars 165
103 Per capita alcohol consumption among people aged 14 years and older 165
104 Infants who were breastfed at 12 months 122
105 Infants who were exclusively breastfed through 6 months 122
106 No broadband internet subscription among households 104
107 Incidence of treated end-stage kidney disease 104
108 Food insecure in the past 12 months among households 55
jacob-umich commented 7 months ago

I think there are a lot of candidates for augmenting datasets

I think I will incorporate the data indicated with asterisks because it seems the most related if not overlappign

jacob-umich commented 7 months ago

I also added some data from 538 because they have data on the state-level. These include:

jacob-umich commented 6 months ago

Here is a table of all the features from each incorporated dataset. To connect the datasets we will need to identify keys/features that can be related. Below is a list of how each dataset can be connected with the main one. Also, there is a list of how some features will be changed.

main urbanization_dist food_prices nutrition human_capital metro_grade sots_index sots_words urbanization_state
0 YearStart stcd Classification Name Country Name Country Name metro_area state phrase state
1 YearEnd state Classification Code Country Code Country Code holc_grade governor category urbanindex
2 LocationAbbr cd Country Name Series Name Series Name white_pop party d_speeches
3 LocationDesc pvi_22 Country Code Series Code Series Code black_pop filename r_speeches
4 DataSource urbanindex Series Name 1960 [YR1960] 2010 [YR2010] hisp_pop url total
5 Topic rural Series Code 1961 [YR1961] 2011 [YR2011] asian_pop percent_of_d_speeches
6 Question exurban 2017 [YR2017] 1962 [YR1962] 2012 [YR2012] other_pop percent_of_r_speeches
7 Response suburban 2018 [YR2018] 1963 [YR1963] 2013 [YR2013] total_pop chi2
8 DataValueUnit urban 2019 [YR2019] 1964 [YR1964] 2014 [YR2014] pct_white pval
9 DataValueType grouping 2020 [YR2020] 1965 [YR1965] 2015 [YR2015] pct_black
10 DataValue 2021 [YR2021] 1966 [YR1966] 2016 [YR2016] pct_hisp
11 DataValueAlt 1967 [YR1967] 2017 [YR2017] pct_asian
12 DataValueFootnoteSymbol 1968 [YR1968] 2018 [YR2018] pct_other
13 DataValueFootnote 1969 [YR1969] 2019 [YR2019] lq_white
14 LowConfidenceLimit 1970 [YR1970] 2020 [YR2020] lq_black
15 HighConfidenceLimit 1971 [YR1971] lq_hisp
16 StratificationCategory1 1972 [YR1972] lq_asian
17 Stratification1 1973 [YR1973] lq_other
18 StratificationCategory2 1974 [YR1974] surr_area_white_pop
19 Stratification2 1975 [YR1975] surr_area_black_pop
20 StratificationCategory3 1976 [YR1976] surr_area_hisp_pop
21 Stratification3 1977 [YR1977] surr_area_asian_pop
22 Geolocation 1978 [YR1978] surr_area_other_pop
23 LocationID 1979 [YR1979] surr_area_pct_white
24 TopicID 1980 [YR1980] surr_area_pct_black
25 QuestionID 1981 [YR1981] surr_area_pct_hisp
26 ResponseID 1982 [YR1982] surr_area_pct_asian
27 DataValueTypeID 1983 [YR1983] surr_area_pct_other
28 StratificationCategoryID1 1984 [YR1984]
29 StratificationID1 1985 [YR1985]
30 StratificationCategoryID2 1986 [YR1986]
31 StratificationID2 1987 [YR1987]
32 StratificationCategoryID3 1988 [YR1988]
33 StratificationID3 1989 [YR1989]
34 1990 [YR1990]
35 1991 [YR1991]
36 1992 [YR1992]
37 1993 [YR1993]
38 1994 [YR1994]
39 1995 [YR1995]
40 1996 [YR1996]
41 1997 [YR1997]
42 1998 [YR1998]
43 1999 [YR1999]
44 2000 [YR2000]
45 2001 [YR2001]
46 2002 [YR2002]
47 2003 [YR2003]
48 2004 [YR2004]
49 2005 [YR2005]
50 2006 [YR2006]
51 2007 [YR2007]
52 2008 [YR2008]
53 2009 [YR2009]
54 2010 [YR2010]
55 2011 [YR2011]
56 2012 [YR2012]
57 2013 [YR2013]
58 2014 [YR2014]
59 2015 [YR2015]
60 2016 [YR2016]
61 2017 [YR2017]
62 2018 [YR2018]
63 2019 [YR2019]
64 2020 [YR2020]
65 2021 [YR2021]
66 2022 [YR2022]

the following changes will be made to the features:

Aerlenbeck commented 6 months ago

I added a file from colab that is looking at the states scoring the "worst" in SDOH categories vs life expectancy (sorry I still don't know how to make files like yours).

Also, what do you mean by features? Is that just the column you are merging on?

Each SDOH is looking at five worst states for each category, then counting the number of times they are the worst, then comparing that with 10 lowest states for life expectancy (my thought worst SDOH=lower life exectancy) which is kind of true, but not the not appealing graph I've ever seen.

I think the SDOH avenue has a lot of other potential correlations we can make though, like avg income per household, maternal mortality rates, education performance. Theres also a few questions on insurance covered before/after birth and we could look at that vs maternal mortality rates. Maybe political leanings if we can find a good survey. Let me know what you think! I added a life expectancy to the data set.

jacob-umich commented 6 months ago

@Aerlenbeck

Also, what do you mean by features? Is that just the column you are merging on?

yea i think for the most part a feature is the same as a column

Yea i think there are a lot of good analyses that can come from that. Where did you get the life expectancy data from? did you get a chance to look at any of the other data sets?

Aerlenbeck commented 6 months ago

@jacob-umich

Life expectancy data is from CDC, not one of her sources for data but I think it should be fine since they are reliable.

I see the datasets and how you plan to link them but I don't see the correlations you're trying to make, I'm not sure if this is in one of the txt files you uploaded here?

Is urbanization just giving a number value where higher is more urbanization per state? I also think we could incorporate that with SDOH (more urbanization better SDOH) and or AVG state income (more urbanization, higher average income?). Maybe we could find some traffic safety data too (more urbanization more car deaths?)

Food prices and nutrition are hard because they're both per year but we could see are food prices trending up? When food prices go up does nutrition go down?

I don't understand the data for metro_grade and redlining, what do these columns mean?

jacob-umich commented 6 months ago

@Aerlenbeck I mainly explained the links in the comments above. I guess it would be simpler to just look at life expectancy. Lets just go with that. If we feel like we are running out coorelations to make, we can bring in the other data sets.

I don't understand the data for metro_grade and redlining, what do these columns mean?

metro_grade measures a degree of segregation from redlining practices https://projects.fivethirtyeight.com/redlining/

Aerlenbeck commented 6 months ago

@jacob-umich I see your correlations now. I think those would all work well too. We need 15 correlations so I think we'll have enough space for everything.

tsivitse commented 6 months ago

Has all the data been cleaned now? Just wondering what the next step is and where I can start contributing.

jacob-umich commented 6 months ago

@tsivitse yea I finished cleaning the main dataset. I think we decided to just use ths augmenting data set that @Aerlenbeck found. That needs to be cleaned I think. You can follow what I did for cleaning that. Then we just need to make some plots

Aerlenbeck commented 6 months ago

@tsivitse @jacob-umich Yea I added a life expectancy dataset to the data folder on google drive, but I also cleaned that data in the .ipynb file (should be towards the bottom) I added to the EDA section. It's mostly just swapping state codes for their full name so the table can be merged with the CDI dataset.

Aerlenbeck commented 6 months ago

@tsivitse We need more diverse plot types, if you're looking for something to think about. I am going to make bar charts (12) with a drop down menu for each SDOH question from the CDI dataset for best 5/worst 5 states for each question, but those will all be one plot type. I'm also planning to do a chloropleth plot for the life expectancy.

With the food prices/nutrition data we could probably do line/scatter plots, but I think we need 5 plots types, so at least 1 maybe 2 more if you see anything that stands out to you.

tsivitse commented 6 months ago

Confirming if the EDA I'm working on should be in the SDOH file with your cleaned data @Aerlenbeck?

Aerlenbeck commented 6 months ago

Yea you can add to that file, if I'm understanding your question correctly @tsivitse

jacob-umich commented 6 months ago

I have compiled all the clean data into one database. The server can use this database instead of pulling from two separate ones. This task is done