macss-modeling / General-Questions

A repo to post questions about code, data, etc.
0 stars 0 forks source link

While running yesterday's inclass code exercise for logit model something like this showed #7

Closed Lynx-jr closed 3 years ago

Lynx-jr commented 3 years ago

I literally copied the logit model code from above, but something the error message was like this: train/test split: preprocessor 1/1, model 1/1: glm.fit: algorithm did not converge, glm.fit: fitted proba...

Someone in my group asked this question in class but I didn't reach that point till now I found this question. Also, the confusion matrix output did not have type I and type 2 errors, how do I handle this?

bjcliang-uchi commented 3 years ago

glm uses an iterative algorithm to find a likelihood maximization (i.e. "convergence"). When your label (Y) is almost perfectly separated (see Wikipedia), it is very hard for the algorithm to find a maximization in a given number of iterations. In this case, it seems that some of your resampled data happen to give you a very imbalanced sample of labels. You can check the problematic cv set by holdout <- assessment(cv_train$splits[[8]]) table(holdout$democrat) where, say, the 8th cv data gives you the warning. Normally simply re-running the vfold_cv function would solve the problem.

pdwaggoner commented 3 years ago

Hi - Thanks @bjcliang-uchi for the answer.

Also, I followed up with Ruben who had this problem and it ended up being a simple problem of data management. That is, it was solved by removing the pid3 feature that he forgot to remove from the data after recoding. There was perfect collinearity between the pid3 and new democrat feature, given that democrat was created from pid3. Once removed, the problem was fixed.