Closed micpesce closed 5 years ago
You have some non-numeric predictors:
> str(pred)
'data.frame': 700 obs. of 20 variables:
$ checking_account : Factor w/ 4 levels "A11","A12","A13",..: 1 2 4 1 4 2 4 2 1 4 ...
$ credit_duration : num [1:700, 1] -1.236 2.247 -0.738 1.75 0.257 ...
$ Credit_history : Factor w/ 5 levels "A30","A31","A32",..: 5 3 5 3 3 3 3 5 3 5 ...
$ purpose : Factor w/ 10 levels "A40","A41","A410",..: 5 5 8 4 4 2 5 1 5 5 ...
$ credit_amount : num [1:700, 1] -0.745 0.949 -0.416 1.633 -0.155 ...
$ savings_account : Factor w/ 5 levels "A61","A62","A63",..: 5 1 1 1 3 1 4 1 2 5 ...
$ employment_since : Factor w/ 5 levels "A71","A72","A73",..: 5 3 4 4 5 3 4 1 3 5 ...
$ percentage_income: num [1:700, 1] 0.918 -0.8697 -0.8697 -0.8697 0.0241 ...
$ personal_status : Factor w/ 4 levels "A91","A92","A93",..: 3 2 3 3 3 3 1 4 2 3 ...
$ other_guarantors : Factor w/ 3 levels "A101","A102",..: 1 1 1 3 1 1 1 1 1 1 ...
$ residence : num [1:700, 1] 1.046 -0.766 0.14 1.046 1.046 ...
$ property : Factor w/ 4 levels "A121","A122",..: 1 1 1 2 2 3 1 3 3 2 ...
$ age : num [1:700, 1] 2.765 -1.191 1.183 0.831 1.534 ...
$ other_plans : Factor w/ 3 levels "A141","A142",..: 3 3 3 3 3 3 3 3 3 3 ...
$ housing : Factor w/ 3 levels "A151","A152",..: 2 2 2 3 2 1 2 2 2 2 ...
$ existing_credits : num [1:700, 1] 1.027 -0.705 -0.705 -0.705 -0.705 ...
$ job : Factor w/ 4 levels "A171","A172",..: 3 3 2 3 3 4 2 4 2 3 ...
$ house_manteinant : num [1:700, 1] -0.428 -0.428 2.334 2.334 -0.428 ...
$ telephone : Factor w/ 2 levels "A191","A192": 2 1 1 1 1 2 1 1 1 1 ...
$ foreign_worker : Factor w/ 2 levels "A201","A202": 1 1 1 1 1 1 1 1 1 1 ...
The way to get around this is to use the formula method to train()
so that these are converted to dummy variables. Also, I suggest centering and scaling the data so that the distance calculations are not skewed by the units of the predictors.
EDIT: I didn't see the attached data at first
Thanks Max, prior to training, all variables, except categorical outcome, have been preprocessed as numeric and scaled/centered. It finally works in the form of : "train(pred, outcome,..) " instead of "train( outcome~ ., data=training) " anyway is OK. I think I was quite shallow because KNN would best fit on datasets with true numeric variables. other than non-hierarchical factors. Eventually categorical variables could be converted in k-1 binary values with k=number of factors, and then apply KNN.
This is my issue performing knn caret train function :
"Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :3 NA's :3
Error: Stopping In addition: There were 50 or more warnings (use warnings() to see the first 50)" 1: predictions failed for Resample01: k=5 Error in knn3Train(train = structure(c("A11", "A12", "A14", "A12", "A12", : unused argument (data = list(checking_account = c(1, 2, 4, 1, 4, 2, 4, 2, 1, 4, 1, 4, 4, 1, 1, 2, 4, 1, 4, 2, 1, 2, 2, 4, 3, 2, 2, 4, 2, 2, 1, 1, 4, 4, 1, 4, 2, 4, 4, 2, 4, 2, 3, 2, 2, 4, 4, 2, 4, 4, 4, 2, 1, 1, 1, 2, 4, 2, 4, 4, 4, 4, 2, 1, 4, 1, 4, 2, 4, 4, 2, 4, 2, 2, 4, 2, 1, 2, 2, 3, 2, 4, 4, 1, 1, 2, 4, 3, 2, 1, 2, 1, 4, 4, 2, 2, 3, 2, 1, 1, 2, 1, 1, 4, 4, 3, 1, 1, 1, 2, 4, 4, 4, 2, 4, 1, 2, 2, 4, 1, 4, 1, 4, 2, 4, 2, 4, 2, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 1, 4, 4, 1, 4, 2, 1, 4, 4, 4, 2, 1, 3, 1, 4, 2, 1, 4, 4, 4, 2, 4, 1, 3, 4, 4, 2, 1, 2, 2, 4, 1, 4, 4, 4, 4, 4, 4, 4, 2, 4, 1, 4, 1, 1, 4, 2, 4, 4, 1, 4, 4, 2, 2, 1, 4, 1, 4, 4, 4, 3, 2, 1, 2, 4, 2, 1, 4, 2, 4, 4, 4, 2, 1, 4, 4, 2, 3, 2, 3, 1, 4, 1, 2, 1, 1, 4, 4, 4, 3, 2, 1, 4, 1, 2, 1, 2, 1, 2, 2, 3, 2, 2, 4, 1, 4, 4, 4, 1, 3, 3, 4, 1, 4, 1, 2, 4, 4, 4, 1, 4, 4, 1, 2, 4, 2, 4, 2, 1, 1, 4, 1, 2, 4, 2, 4, 4, 2, [... truncated]
Session Info: