Closed wesm closed 10 years ago
Not sure what people would want but in the absence of a strong reason to do otherwise, I would prefer to not transpose the axes.
I only transposed there to make it output to the console (lot of long-ish columns)
got it.
i mean, you see the example above, right? You have multiple columns and you want to produce dummy columns for each combination of a set of factors
i think this machinery might already be in patsy
...might be possible to lift it from there
looks pretty covered by get_dummies
@jreback any opinion on reopening this so get_dummies can handle DataFrames?
',PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked\n0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S\n1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C\n'
We could replace this
features = pd.concat([data.get(['Fare', 'Age']),
pd.get_dummies(data.Sex, prefix='Sex'),
pd.get_dummies(data.Pclass, prefix='Pclass'),
pd.get_dummies(data.Embarked, prefix='Embarked')],
axis=1)
with this
features = pd.get_dummies(data, include=['Sex', 'Pclass', 'Embarked'], exclude=['Fare', 'Age])
Or we can check they dtypes on the DataFrame to see that [Fare
, Age
] are numeric and not dummize them automatically, so you can leave off the exclude
parameter. The current way seems a bit verbose, especially when you have a mixture of
categorical columns that need dummies and numerical columns that don't.
+1
@TomAugspurger nice idea. pls open a new issue for this though.
Here is another technique to create automatically dummie: http://python-apuntes.blogspot.com.ar/2017/04/creacion-de-variables-de-grupo.html
there are already a few things floating around but having something more structured / more options + in the pandas namespace would be nice
from an e-mail on the statsmodels mailing list