Closed Lynx-jr closed 3 years ago
glm uses an iterative algorithm to find a likelihood maximization (i.e. "convergence"). When your label (Y) is almost perfectly separated (see Wikipedia), it is very hard for the algorithm to find a maximization in a given number of iterations. In this case, it seems that some of your resampled data happen to give you a very imbalanced sample of labels. You can check the problematic cv set by holdout <- assessment(cv_train$splits[[8]]) table(holdout$democrat)
where, say, the 8th cv data gives you the warning. Normally simply re-running the vfold_cv function would solve the problem.
Hi - Thanks @bjcliang-uchi for the answer.
Also, I followed up with Ruben who had this problem and it ended up being a simple problem of data management. That is, it was solved by removing the pid3
feature that he forgot to remove from the data after recoding. There was perfect collinearity between the pid3
and new democrat
feature, given that democrat
was created from pid3
. Once removed, the problem was fixed.
I literally copied the logit model code from above, but something the error message was like this: train/test split: preprocessor 1/1, model 1/1: glm.fit: algorithm did not converge, glm.fit: fitted proba...
Someone in my group asked this question in class but I didn't reach that point till now I found this question. Also, the confusion matrix output did not have type I and type 2 errors, how do I handle this?