Closed ruqianq closed 3 years ago
If I remember correctly, sklearn OneHotEncoder did not support string variables/columns in the past but it does now. So if you are primarily using sklearn for your machine learning work, I would just stick with sklearn and use OneHotEncoder.
Thank you so much! I have seen lots of discussion about how to properly encode your variables. Most people, like you said, agree that using OneHotEncoder is better. But from my experience, I found it is not easy to use comparing to get_dummies. Anyway, thank you for your comment.
Hi, thanks for sharing this repo, and it is very helpful for me to integrate open source with SAS model studio.
I just have a general question about the model in
sf_onehotvars_sklearn_randomforest.py
. You usepd.get_dummies
to encode your categorical data, what is the advantage of using get_dummies instead of OneHotEncoder from sklearn?I guess this is more a general machine learning question, but I would love to hear the perspective from you guys. Thanks!