Open telferm57 opened 4 years ago
This would be a very useful feature and would make the setting of reference categories in e.g. regression models much easier and more transparent.
This would be very useful and makes life much easier for managing the data processing. Often wondered why it wasn't a feature, but hopefully this will be added as an enhancement very soon!
This looks pretty useful and neat. A while ago I had to achieve something similar, and ended up with some horrible looking code. Definitely an enhancement worth adding. 👍
I suggest further:
if ref_vals is not a list, or the length of ref_vals != length of 'columns' , an exception will be raised
if drop_first =True and ref_vals is supplied, raise an exception (could just warn?)
if object is a series, allow ref_vals = 'string' as well as ref_vals =['string']
I have started coding this, should have it complete soon
A very useful feature! I would love it if it got implemented
When one-hot encoding a pandas categorical column, with drop_first = True, there is no control over which value is dropped. So if I need to specify the reference value to drop, I can't use drop_first. I have to manually drop the columns that have been unnecessarily created.
I would like to enhance the get_dummies method to be able to specify for each column in 'columns' a reference value to be used as the dropped column. For example:
for each categorical column specified in the new ref_values parameter: if value does exists, use that as the reference value if value does not exist, proceed with normal behaviour - i.e. drop the (lexical) first (or ignore with warning?)
I don't think there are any API breaking implications?
The achieve this now, without the enhancement, I have to do something like :
I am willing to have a go at this if it is accepted as an enhancement