isabelle926 / ai4all_nhanes

0 stars 6 forks source link

merge all the data #11

Closed albemlee closed 3 years ago

albemlee commented 3 years ago

a single dataframe for each year

AlbertLeeUCSF commented 3 years ago

@helenliu583 did 2011-2012 (all_2011_df) @sjaya09 will do 2013-2014 (all_2013_df) @isabelle926 will doo 2015-2016 (all_2015_df)

helenliu583 commented 3 years ago

@sjaya09 This is all the 2013-2014 code. I just tested it, so it'd probably be good to check over it since I just used find and replace on docs.

Read in data

# 2013-2014 test - read in data
acculturation_2013_df = pd.read_csv('../data/raw/2013-2014_acculturation.csv')
alcohol_use_2013_df = pd.read_csv('../data/raw/2013-2014_alcohol_use.csv')
bp_cholesterol_2013_df = pd.read_csv('../data/raw/2013-2014_bp_cholesterol.csv')
cardiovascular_2013_df = pd.read_csv('../data/raw/2013-2014_cardiovascular.csv')
consumer_behavior_2013_df = pd.read_csv('../data/raw/2013-2014_consumer_behavior.csv')
demographic_2013_df = pd.read_csv('../data/raw/2013-2014_demographic.csv')
dermatology_2013_df = pd.read_csv('../data/raw/2013-2014_dermatology.csv')
diabetes_2013_df = pd.read_csv('../data/raw/2013-2014_diabetes.csv')
diet_nutrition_2013_df = pd.read_csv('../data/raw/2013-2014_diet_nutrition.csv')
drug_use_2013_df = pd.read_csv('../data/raw/2013-2014_drug_use.csv')
early_childhood_2013_df = pd.read_csv('../data/raw/2013-2014_early_childhood.csv')
food_security_2013_df = pd.read_csv('../data/raw/2013-2014_food_security.csv')
health_insurance_2013_df = pd.read_csv('../data/raw/2013-2014_health_insurance.csv')
health_status_2013_df = pd.read_csv('../data/raw/2013-2014_health_status.csv')
hospital_access_to_care_2013_df = pd.read_csv('../data/raw/2013-2014_hospital_access_to_care.csv')
housing_2013_df = pd.read_csv('../data/raw/2013-2014_housing.csv')
immunization_2013_df = pd.read_csv('../data/raw/2013-2014_immunization.csv')
income_2013_df = pd.read_csv('../data/raw/2013-2014_income.csv')
medical_conditions_2013_df = pd.read_csv('../data/raw/2013-2014_medical_conditions.csv')
mental_health_2013_df = pd.read_csv('../data/raw/2013-2014_mental_health.csv')
occupation_2013_df = pd.read_csv('../data/raw/2013-2014_occupation.csv')
oral_health_2013_df = pd.read_csv('../data/raw/2013-2014_oral_health.csv')
pesticide_use_2013_df = pd.read_csv('../data/raw/2013-2014_pesticide_use.csv')
physical_activity_2013_df = pd.read_csv('../data/raw/2013-2014_physical_activity.csv')
physical_functioning_2013_df = pd.read_csv('../data/raw/2013-2014_physical_functioning.csv')
preventative_aspirin_use_2013_df = pd.read_csv('../data/raw/2013-2014_preventative_aspirin_use.csv')
reproductive_2013_df = pd.read_csv('../data/raw/2013-2014_reproductive.csv')
sexual_behavior_2013_df = pd.read_csv('../data/raw/2013-2014_sexual_behavior.csv')
sleep_disorder_2013_df = pd.read_csv('../data/raw/2013-2014_sleep_disorder.csv')
smoking_cigarette_2013_df = pd.read_csv('../data/raw/2013-2014_smoking_cigarette.csv')
smoking_recent_tobacco_2013_df = pd.read_csv('../data/raw/2013-2014_smoking_recent_tobacco.csv')
urology_2013_df = pd.read_csv('../data/raw/2013-2014_urology.csv')
weight_history_2013_df = pd.read_csv('../data/raw/2013-2014_weight_history.csv')

Merge data

# 2013-2014 test2
all_2013_df = acculturation_2013_df.merge(alcohol_use_2013_df, how='outer')
df_list = [alcohol_use_2013_df,
            bp_cholesterol_2013_df, 
           cardiovascular_2013_df,
          consumer_behavior_2013_df, 
          demographic_2013_df,
           dermatology_2013_df,
           diabetes_2013_df,
           diet_nutrition_2013_df,
           drug_use_2013_df, 
           early_childhood_2013_df,
           food_security_2013_df,
           health_insurance_2013_df, 
           health_status_2013_df, 
           hospital_access_to_care_2013_df, 
           housing_2013_df,
           immunization_2013_df,
           income_2013_df,
           medical_conditions_2013_df,
           mental_health_2013_df,
           occupation_2013_df,
           oral_health_2013_df,
           pesticide_use_2013_df,
           physical_activity_2013_df,
           physical_functioning_2013_df,
           preventative_aspirin_use_2013_df,
           reproductive_2013_df,
           sexual_behavior_2013_df,
           sleep_disorder_2013_df,
           smoking_cigarette_2013_df,
           smoking_recent_tobacco_2013_df,
           urology_2013_df,
           weight_history_2013_df, 
          ]
for df in df_list:
    all_2013_df = all_2013_df.merge(df, how='outer')

all_2013_df

Convert to csv file

all_2013_df.to_csv('all_2013_df.csv', index=False)

All the code should be in the repo. I cleaned up the comments for this a little, but the original code should be available if you'd like to see it.