jschulberg / Dog-Returns

A data science analysis to classify whether or not an adopted dog will be returned.
0 stars 0 forks source link

Fill NAs in Feature Columns #35

Closed jschulberg closed 2 years ago

jschulberg commented 2 years ago

So there are a few columns that we're trying to use for predictive purposes that have NAs in them. Maybe we can use a K-Nearest Neighbor algorithm to predict what their value should be?

Total Records = 10489

Number of NAs by column:

multi_color = 568 num_colors = 568 contains_black = 568 contains_white = 568 contains_yellow = 568 MIX_BOOL = 131 WEIGHT2 = 538 Age at Adoption (days) = 4687

jschulberg commented 2 years ago

KNN to impute missing values:

https://machinelearningmastery.com/knn-imputation-for-missing-values-in-machine-learning/

rkelley05 commented 2 years ago

Computed missing values with KNN, decide if we should scale data prior to imputation or not.

jschulberg commented 2 years ago

@rkelley05 Here's the convo with Julie about where to impute NA's

SEX_Male -- 6th value (M for 'male' or F for 'female') SEX_Female -- 6th value (M for 'male' or F for 'female') multi_color -- Not a standard way of denoting the colors (kids just come up with their own values) num_colors -- Not a standard way of denoting the colors (kids just come up with their own values) MIX_BOOL -- Don't use this because every dog is a mix. Sometimes they put 'Mix' just to get them adopted faster. contains_black -- Not a standard way of denoting the colors (kids just come up with their own values) contains_white -- Not a standard way of denoting the colors (kids just come up with their own values) contains_yellow -- Not a standard way of denoting the colors (kids just come up with their own values) WEIGHT2 -- Impute with KNN Age at Adoption (days) -- Impute with KNN is_retriever is_shepherd is_other_breed num_behav_issues puppy_screen -- If it doesn't say puppy screen, check the age. If it's less than 6 months old, it's a puppy. new_this_week -- Delete this needs_play no_apartments -- Use imputation for this energetic -- 0 if not specified in BEHAVIORAL NOTES shyness -- 0 if not specified inBEHAVIORAL NOTES` needs_training -- Use imputation for this BULLY_SCREEN -- 0 if not specified BULLY_WARNING -- 0 if not specified OTHER_WARNING -- 0 if not specified CATS_LIVED_WITH -- 1 if not specified, but could try imputation CATS_TEST -- 1 if not specified (good with cats), but could try imputation KIDS_FIXED -- Impute for missing values. Also unsure about how caution should be treated, so consider imputing those values as well DOGS_IN_HOME -- 0 if not specified (if they don't know, they assume they're good with dogs) DOGS_REQ -- 0 if not specified (if they don't know, they assume they're good with dogs) has_med_issues diarrhea -- REMOVE THIS (They all get diarrhea) ehrlichia uri -- REMOVE THIS ear_infection tapeworm -- REMOVE THIS general_infection -- REMOVE THIS demodex (skin condition) car_sick -- 0 if not specified dog_park -- REMOVE THIS (not consistent) leg_issues anaplasmosis treated_vaccinated -- REMOVE THIS HW_FIXED FT_FIXED -- REMOVE THIS spay_neutered -- REMOVE THIS (all dogs are spayed/neutered)