shankarpandala / lazypredict

Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning
MIT License
2.78k stars 321 forks source link

Different results when running the models manually #366

Closed krb19-econ closed 2 years ago

krb19-econ commented 2 years ago

Describe the bug I have tried using lazypredict for classification algorithms. However, when I run a certain model manually, it provides different performance metrics.

Have attached images using the example given in the documentation.

Screenshots

  1. Image of models from lazypredict image

  2. Running logistic regression manually image

lahdjirayhan commented 2 years ago

Have you tried supplying the same random_state parameters to both LazyClassifier and the sklearn manually-fit classifiers?

krb19-econ commented 2 years ago

Yes. As mentioned in the first image in the original post, the random state was 123. Using that, I have used the same train and test datasets for both LazyClassifier and manual sklearn.

lahdjirayhan commented 2 years ago

Thanks for pointing out, but LogisticRegression in scikit-learn also takes a random_state parameter. See here: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression

LazyClassifier also takes a random_state parameter. See here: https://github.com/shankarpandala/lazypredict/blob/652f5de3d1a21a826cac32967eb36549c8cd3b57/lazypredict/Supervised.py#L210-L218

These random_state parameters were not specified yet. Maybe try specifying similar values for it and see if the results are still different.

krb19-econ commented 2 years ago

Thanks for mentioning that. The random_state under LazyClassifier is given as 42, image attached.

image

I used the same under Logistic Regression while doing it manually. However the result is still coming out to be different.

image

shankarpandala commented 2 years ago

It will never be same. lazypredict does preprocessing of data internally before doing any fit.

nikolamilovic-rg commented 5 months ago

I feel like this is not completely solved with the last comment. How should we treat the results of the lazy classifier then? Is it even useful if we take this into consideration? Does it use the same preprocessing steps for each of the models? Can this be at least a relative indicator for the top models (e.g. SVC gave the best accuracy after lazy classifier prediction but even though we cannot get the same accuracy by running isolated SVC model - we can say that it will give better accuracy than any other model that is executed isolated)?