pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.51k stars 17.88k forks source link

API/DEPR: Deprecate inplace parameter #16529

Open gfyoung opened 7 years ago

gfyoung commented 7 years ago

The parameter inplace=False should be deprecated across the board in preparation for pandas 2, which will not support that input (we will always return a copy). That would give people time to stop using it.

Thoughts?

Methods using inplace:

Deprecation non controvertial (a copy will be made anyway, and inplace=True does not add value):

Not sure:

Should be able to not copy memory (under discussion on what to do):

Special cases:

tommedema commented 2 years ago

inplace does not generally do anything inplace but makes a copy and reassigns the pointer

Should the solution then not to be make sure that inplace does not make a copy when possible, rather than deprecating it?

MarcoGorelli commented 1 year ago

Hi @jorisvandenbossche

It's been 1.5 years since you wrote

but based on my initial experiments with Copy-on-Write (https://github.com/pandas-dev/pandas/issues/36195, no PR yet), I am no longer fully convinced that completely deprecating the inplace keyword is the way to go.

Just wondering if your stance has changed at all

jorisvandenbossche commented 1 year ago

A formal proposal to deprecate most occurrences of the inplace keyword (and the copy keyword as well) has been opened at https://github.com/pandas-dev/pandas/pull/51466

jondo commented 1 year ago

Here's the direct link to the proposal.

jorisvandenbossche commented 10 months ago

FWIW, the proposal to remove the inplace keyword in certain methods is being finalized (https://github.com/pandas-dev/pandas/pull/51466)

Rinfore commented 10 months ago

Is there any way to keep the illusion of a mutating object with a pandas extension? https://pandas.pydata.org/docs/dev/development/extending.html

jorisvandenbossche commented 10 months ago

@Rinfore can you clarify your question a bit? (pandas objects are still mutable, removing the inplace keyword in methods that didn't actually work inplace won't change that)

If you want some syntactic sugar to avoid having to re-assign to the same variable (like df = df.reset_index() instead of df.reset_index(inplace=True)), you can't do achieve that with an extension type, but I think you could in theory do that with an accessor (it would look something like df.inplace.reset_index(), if the accessor API allows to modify the calling object). I am not sure if I would recommend doing that, though ;)

Rinfore commented 10 months ago

Thanks so much for the insight @jorisvandenbossche!

I've been working on creating some custom accessors in a domain-specific library that allow to add abstractions with embedded business logic to data frames (e.g. fictitious example: based on presence of columns: [EmployeeId, Time, Event], I can classify the data-frame as a EmployeeRecords data frame) and use extension types on it (df.employeerecords.<method>). This allows users to treat them almost like instances of EmployeeRecords classes, but retain the flexibility to dive into the Pandas API.

In my custom accessors, I may do things such as filter out problematic rows via higher-level functions (e.g. df.employeerecords.drop_inconsistent_records()). Previously, by dropping rows in-place, I could avoid having my users do something like df = df.employeerecords.drop_inconsistent_records() so I was wondering if it would be possible to continue avoiding this (exactly as you mention, syntactic sugar).

I am also wondering if mutating a data frame to add columns (e.g. df['inconsistent'] = ...), or other similar operations, would become impossible in the future, which would be of significant concern to me, as my custom accessors rely heavily on things such as mutating the data frame to add columns or mutate other state.

In summary:

  1. I was wondering whether I could use custom accessors to achieve the syntactic sugar you mentioned, which seemed to be 'maybe' based on your reply above? I can't see how this is possible though, as assigning a new data frame to self._obj in the accessor does not replace the reference to the original object.

e.g.

@pd.api.extensions.register_dataframe_accessor('test')
class TestAccessor:

    def __init__(self, df: pd.DataFrame) -> None:
        self._df = df

    def test_assign(self):
        copied_df = self._df.copy()
        copied_df.columns = ['new_df']
        self._df = copied_df

>>> test_df = pd.DataFrame({'old_df': [0]})
>>> test_df.test.test_assign()
>>> test_df.columns
Index(['old_df'], dtype='object') # not Index(['new_df'], dtype='object')

It appears as if the internal assignment in the custom accessor does nothing to modify the reference in the enclosing environment. Thus, I would likely need to return the new reference and have users update their variable references.

  1. Is it likely that mutating pandas objects via operations [], .index = , etc. would be disallowed in the future? (I'm hoping not...)

I apologise if this is not the correct forum to pose these queries, or if I have made some other error or slight in my post.

kapoor1992 commented 7 months ago

7 years since this was opened, is this still something we want to do?

phofl commented 6 months ago

This is actually covered by PDEP8, I guess we can close here