Open ralversity opened 1 year ago
What's the error that you get in creating the model? I believe that Python implements bool as a subclass to integer and should you, for example, use a Normalization layer and use your insurance_one_hot it will be [0,1] as output.
This example shows the integer subclass
And then applying normalization will just use the bool and give you a [0,1] float32 back.
Facing same issue
Hi @ralversity , @cwestergren and @uKnowKlaus ,
There has been an update to pd.get_dummies()
to return bool
dtypes by default (rather than float
or int
).
You can get the behaviour of the first screenshot by setting pd.get_dummies(dtype=int)
.
For example:
import pandas as pd
df = pd.DataFrame({'A': ['a', 'b', 'a'],
'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
df_one_hot = pd.get_dummies(df, dtype=bool) # bool is default
df_one_hot
Output:
C | A_a | A_b | B_a | B_b | B_c |
---|---|---|---|---|---|
0 | 1 | True | False | False | True |
1 | 2 | False | True | False | False |
2 | 3 | True | False | False | True |
Change to dtype=int
:
import pandas as pd
df = pd.DataFrame({'A': ['a', 'b', 'a'],
'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
df_one_hot = pd.get_dummies(df, dtype=int)
df_one_hot
Output:
C | A_a | A_b | B_a | B_b | B_c |
---|---|---|---|---|---|
0 | 1 | 1 | 0 | 0 | 1 |
1 | 2 | 0 | 1 | 0 | 0 |
2 | 3 | 1 | 0 | 0 | 1 |
See the docs here: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html
Hey @mrdbourke, Thanks for your reply. I already tried changing dtype to int and float it was still returning bool values. Tried restarting the kernel no effect whatsoever.
Do you get an error when applying normalisation though?
It's still a subclass of Integers, as seen at https://docs.python.org/3/c-api/bool.html
See my previous reply.
@cwestergren I did use normalization as well but didn't work. IDK what's the issue with get_dummies. Then I went with LabelEncoding.
Understood. If you want to share your code here please do, but label encoding would work too.
Thanks. I'm after the point of error. It will still be a bool type, but internally it's integers.
Can you share the error you get?
Sorry, I didn't save the errors. I moved on with LabelEncoding so..
All good, happy coding :)
Hey @uKnowKlaus I had the same issue but then I tried with 'int64' instead of 'int' and it worked!
Thx everyone, I had this issue too
@samuelperezh Hi, would you mind sharing the code you used with 'int64' ?
Hey @uKnowKlaus I had the same issue but then I tried with 'int64' instead of 'int' and it worked!
np.int64 and 'run all cell' it worked for me
I had the same issue and even after adding dtype=int however after adding df = df.astype(int) it worked perfectly well,
df = pd.get_dummies(df,sparse=False,dtype=int) df = df.astype(int)
Just use the inbuilt dtype method along with pd.get_dummies() like: df = pd.get_dummies(df,columns=['X','Y','Z'], dtype='int')
It works perfectly fine.
Not sure if I may have just done something wrong here, or if something has changed. But I noticed that when going through this I was having trouble creating the model. I discovered that the reason is that when I did this part:
It resulted in this:
I wound up changing the function to this and it fixed it for me, although not sure if this was the right thing to do or not: