Closed buhtz closed 8 months ago
Just do
pandas.set_option("future.no_silent_downcasting", True)
as suggested on the stack overflow question
The series will retain object dtype in pandas 3.0 instead of casting to int64
pandas.set_option("future.no_silent_downcasting", True)
But doesn't this just deactivate the message but doesn't modify the behavior.
To my understanding the behavior is the problem and need to get solved. Or not? My intention is to extinguish the fire and not just turn off the fire alarm but let the house burn down.
I'm having this problem as well. I have the feeling it's related to .replace
changing the types of the values (as one Stack Overflow commenter implied). Altering the original example slightly:
s = Series(['foo', 'bar'])
replace_dict = {'foo': '1', 'bar': '2'} # replacements maintain original types
s = s.replace(replace_dict)
makes the warning go away.
I agree with @buhtz in that setting the "future" option isn't really getting at the root of understanding how to make this right. I think the hard part for most of us who have relied on .replace
is that we never thought of it as doing any casting -- it was replacing. Now the semantics seem to have changed. It'd be great to reopen this issue to clarify the thinking, intention, and direction so that we can come up with appropriate work-arounds.
s that we never thought of it as doing any casting
This is exactly the thing we are trying to solve. replace was previously casting your dtypes and will stop doing so in pandas 3
This is exactly the thing we are trying to solve. replace was previously casting your dtypes and will stop doing so in pandas 3
But it is unclear how to replace and cast. E.g. when I have [0, 1]
integers they stand for female and male.
df.gender = df.gender.astype(str)
df.gender = df.gender.replace({'0': 'male', '1': 'female'})
Is that the solution you have in mind? From a users perspective it is a smelling workaround.
The other way around is nearly not possible because I can not cast a str word to an integer.
print(df.gender) # ['male', 'male', 'female']
df.gender = df.gender.astype(int) # <-- ERROR
df.gender = df.gender.replace({'male': 0, 'female': 1})
What is wrong with casting in replace() ?
The other way around is nearly not possible because I can not cast a str word to an integer.
One alternative (although I realise a non .replace
supported "alternative" may not be what was actually desired) is to use categoricals with .assign
:
import pandas as pd
df = pd.DataFrame(['male', 'male', 'female'], columns=['gender']) # from the original example
genders = pd.Categorical(df['gender'])
df = df.assign(gender=genders.codes)
If semantically similar data is spread across multiple columns, it gets a little more involved:
import random
import numpy as np
import pandas as pd
def create_data(columns):
genders = ['male', 'male', 'female']
for i in columns:
yield (i, genders.copy())
random.shuffle(genders)
# Create the dataframe
columns = [ f'gender_{x}' for x in range(3) ]
df = pd.DataFrame(dict(create_data(columns)))
# Incorporate all relevant data into the categorical
view = (df
.filter(items=columns)
.unstack())
categories = pd.Categorical(view)
values = np.hsplit(categories.codes, len(columns))
to_replace = dict(zip(columns, values))
df = df.assign(**to_replace)
which I think is what the Categorical documentation is trying to imply.
I got here, trying to understand what pd.set_option('future.no_silent_downcasting', True)
does.
The message I get is from .fillna()
, which is the same message for .ffill()
and .bfill()
. So I'm posting this here in case someone is looking for the same answer using the mentioned functions. This is the warning message I get:
FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version.
Call result.infer_objects(copy=False) instead.
To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
Maybe the confusion arises from the way the message is phrased, I believe it's kind of confusing, it creates more questions than answers:
Call result.infer_objects(copy=False) instead.
, is it telling me to call it before the function I'm trying to use, after? Is it telling me not to use the function? (I guess not since infer_objects
should do something different than replace
or one of the fill
functions)pd.set_option('future.no_silent_downcasting', True)
am I removing the downcasting or am I making the downcasting not silent? Maybe both?From what I understand, pd.set_option('future.no_silent_downcasting', True)
removes the downcasting the functions do and if it needs to do some downcasting an error would be raised, but I would need to be corrected here if I'm wrong.
So... I did some digging and I think I have a better grasp of what's going on with this FutureWarning. So I wrote an article in Medium to explain what's happening. If you want to give it a read, here it is:
Deciphering the cryptic FutureWarning for .fillna in Pandas 2
Long story short, do:
with pd.option_context('future.no_silent_downcasting', True):
# Do you thing with fillna, ffill, bfill, replace... and possible use infer_objects if needed
I feel like this thread is starting to become a resource. In that spirit:
I just experienced another case where .replace
would have been amazing, but I now need an alternative: a column of strings that are meant to be floats, where the only "offending" values are empty strings (meant to be NaN's). Consider:
records = [
{'a': ''},
{'a': 12.3},
]
df = pd.DataFrame.from_records(records)
I would have first reached for .replace
. Now I consider .filla
, but that doesn't work either. Using .assign
with .to_numeric
does the trick:
In [1]: df.dtypes
Out[1]:
a object
dtype: object
In [2]: x = df.assign(a=lambda x: pd.to_numeric(x['a']))
In [3]: x
Out[3]:
a
0 NaN
1 12.3
In [4]: x.dtypes
Out[4]:
a float64
dtype: object
From your code:
x = df.assign(a=lambda x: pd.to_numeric(x['a']))
I would do it like this, it feels a little cleaner and easier to read:
df['a'] = pd.to_numeric(df['a'])
You said you wanted to use replace
, if you want to use it, you can do this:
with pd.option_context('future.no_silent_downcasting', True):
df2 = (df
.replace('', float('nan')) # Replace empty string for nans
.infer_objects() # Allow pandas to try to "infer better dtypes"
)
df2.dtypes
# a float64
# dtype: object
A note about
Now I consider
.filla
, but that doesn't work either.
That would not work because .fillna
fills na values but ''
(empty string) is not na. (see Filling missing data).
explicitly do the conversion in two steps and the future warning will go away.
In the first step, do the replace with the numbers as strings to match the original dtype replace_dict = {'foo': '2', 'bar': '4'}
in the second step, convert the dtype to int s = s.replace(replace_dict).astype(int)
This will run without the warning even when you have not suppressed warnings
I got this because I was trying to filter a dataframe using the output from Series.str.isnumeric()
.
My dataframe contained NA values, so the resulting mask contained NA values.
Normally I use fillna(False)
to get rid of these.
What I would normally do:
df = pd.DataFrame({'A': ['1', '2', 'test', pd.NA]})
mask = df['A'].str.isnumeric().fillna(False)
What I need to do now:
df = pd.DataFrame({'A': ['1', '2', 'test', pd.NA]})
with pd.option_context('future.no_silent_downcasting', True):
mask = df['A'].str.isnumeric().fillna(False)
The mask still seems to work without casting it to boolean.
See the official deprecation notice in the release notes.
Note that if you don't mind either way, the original code still works, and will silently downcast the dtype (with a warning) until Pandas 3.0, then will switch to preserve the dtype after Pandas 3.0.
It would be great if we could stop breaking changes.
Hello Folks, Thanks for your helpful previous answers. Especially
explicitly do the conversion in two steps and the future warning will go away.
In the first step, do the replace with the numbers as strings to match the original dtype replace_dict = {'foo': '2', 'bar': '4'}
in the second step, convert the dtype to int s = s.replace(replace_dict).astype(int)
This will run without the warning even when you have not suppressed warnings
This works well when going trom string to int; but i struggle to go from string to bool :
df=pd.DataFrame({'a':['','X','','X','X']})
I want to replace '' with False & 'X' with True
Trying to go from string to bool directly
d_string= {'':'True','X':'False'}
s = df['a'].replace(d_string)
print (s)
0 True
1 False
2 True
3 False
4 False
Name: a, dtype: object
print(s.astype('bool')) # this doesn't work
0 True
1 True
2 True
3 True
4 True
Name: a, dtype: bool
going from string to int to bool works; but isn't there a better solution ? i must be missing something obvious right ?
d_int = {'':'0','X':'1'}
s = df['a'].replace(d_int)
print (s)
0 0
1 1
2 0
3 1
4 1
Name: a, dtype: object
print(s.astype('int'))
0 0
1 1
2 0
3 1
4 1
Name: a, dtype: int64
print(s.astype('int').astype('bool')) # --> this is the only solution i found using replace
without triggering the downcasting warning
0 False
1 True
2 False
3 True
4 True
Name: a, dtype: bool
Arnaudno,
You're doing more work than you need to. The boolean value of all strings except for empty strings is true (empty strings have a boolean value of false). So in your case, you don't need the replace at all. All you need is to convert the string to boolean and you will get the result you want.
df=pd.DataFrame({'a':['','X','','X','X']}) s = df['a'].astype('bool') print(s)
and you will get
0 False 1 True 2 False 3 True 4 True Name: a, dtype: bool
Thank you very much @Data-Salad .
Research
[X] I have searched the [pandas] tag on StackOverflow for similar questions.
[X] I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/q/77995105/4865723
Question about pandas
Hello,
and please take my apologize for asking this way. My stackoverflow question [1] was closed for IMHO no good reason. The linked duplicates do not help me [2]. And I was also asking on pydata mailing list [3] without response.
The example code below gives me this error using Pandas 2.2.0
I found several postings about this future warning. But my problem is I don't understand why it happens and I also don't know how to solve it.
I am aware of other questions and answers [2] but I don't know how to apply them to my own code. The reason might be that I do not understand the cause of the error.
The linked answers using
astype()
before replacement. But again: I don't know how this could solve my problem.Thanks in advance Christian
[1] -- https://stackoverflow.com/q/77995105/4865723 [2] -- https://stackoverflow.com/q/77900971/4865723 [3] -- https://groups.google.com/g/pydata/c/yWbl4zKEqSE