pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.9k stars 18.03k forks source link

DOC: Series.update throws a `FutureWarning` about `def[col] = df[col].method` but `.update` returns `None` and works `inplace` #59788

Open spawn-guy opened 2 months ago

spawn-guy commented 2 months ago

Pandas version checks

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.Series.update.html#pandas.Series.update

Documentation problem

df.update resembles how python.dict.update works, but df.update doesn't support CoW

Suggested fix for documentation

remove FutureWarning for the df.update

or create a (for example) df.coalesce method that will, actually, return something. this shouldn't brake existing code

rhshadrach commented 2 months ago

df.update doesn't support CoW

Thanks for the report - can you provide a reproducible example on how CoW is not supported.

spawn-guy commented 2 months ago

@rhshadrach here is some code and log

# select best source: heading
# HeadingTrue > HeadingMagnetic > HeadingAndDeclination (this is also magnetic) > TrackMadeGood
measurements_df["heading"] = measurements_df["gps_course_over_ground"]
# replace if other value is not nan
measurements_df["heading"].update(measurements_df["gps_heading"])

FutureWarning

_task.py:427: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

  measurements_df["heading"].update(measurements_df["gps_heading"])

Inconsistency with the warning:

rhshadrach commented 2 months ago

Thanks @spawn-guy, however your example is not reproducible because you did not provide measurements_df. Can you provide a reproducible example?

spawn-guy commented 2 days ago

@rhshadrach it took me some time to pick this up, but here is a small test. at first i thought it might be related to the mask that i use, but the FutureWarning is thrown without it as well

import numpy as np
import pandas as pd

# test pandas warnings
df = pd.DataFrame(
    {
        "A": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
        "B": [1, 1, 1, 1, 1, 1],
        "C": [np.nan, 5, 6, np.nan, np.nan, np.nan],
        "D": [0, 0, 2, 2, 0, 0],
    }
)

# with mask
# df = df[df["D"] > 0]

df["E"] = df["A"]
df["E"].update(df["B"])
# df["E"].update(df["C"])
print(df)

results in

cli_python_311_upgrade_test.py:209: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

  df["E"].update(df["B"])

    A  B    C  D    E
0 NaN  1  NaN  0  1.0
1 NaN  1  5.0  0  1.0
2 NaN  1  6.0  2  1.0
3 NaN  1  NaN  2  1.0
4 NaN  1  NaN  0  1.0
5 NaN  1  NaN  0  1.0

the FutureWarning is thrown after df["E"].update(df["B"])

so, in current implementation, i don't see a way to fix this FutureWarning for the reasons mentioned above

spawn-guy commented 2 days ago

and if i do as the warning suggests - it will be a mistake

import numpy as np
import pandas as pd

# test pandas warnings
df = pd.DataFrame(
    {
        "A": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
        "B": [1, 1, 1, 1, 1, 1],
        "C": [np.nan, 5, 6, np.nan, np.nan, np.nan],
        "D": [0, 0, 2, 2, 0, 0],
    }
)

# with mask
# df = df[df["D"] > 0]

df["E"] = df["A"]
df["E"].update(df["B"])
# df["E"].update(df["C"])
print(df)

df["E"] = df["A"]
df["E"] = df["E"].update(df["C"])
print(df)

output

cli_python_311_upgrade_test.py:209: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

  df["E"].update(df["B"])
    A  B    C  D    E
0 NaN  1  NaN  0  1.0
1 NaN  1  5.0  0  1.0
2 NaN  1  6.0  2  1.0
3 NaN  1  NaN  2  1.0
4 NaN  1  NaN  0  1.0
5 NaN  1  NaN  0  1.0

    A  B    C  D     E
0 NaN  1  NaN  0  None
1 NaN  1  5.0  0  None
2 NaN  1  6.0  2  None
3 NaN  1  NaN  2  None
4 NaN  1  NaN  0  None
5 NaN  1  NaN  0  None
cli_python_311_upgrade_test.py:214: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

  df["E"] = df["E"].update(df["C"])

notice the all-None column E