Open gfyoung opened 7 years ago
inplace does not generally do anything inplace but makes a copy and reassigns the pointer
Should the solution then not to be make sure that inplace does not make a copy when possible, rather than deprecating it?
Hi @jorisvandenbossche
It's been 1.5 years since you wrote
but based on my initial experiments with Copy-on-Write (https://github.com/pandas-dev/pandas/issues/36195, no PR yet), I am no longer fully convinced that completely deprecating the inplace keyword is the way to go.
Just wondering if your stance has changed at all
A formal proposal to deprecate most occurrences of the inplace
keyword (and the copy
keyword as well) has been opened at https://github.com/pandas-dev/pandas/pull/51466
Here's the direct link to the proposal.
FWIW, the proposal to remove the inplace
keyword in certain methods is being finalized (https://github.com/pandas-dev/pandas/pull/51466)
Is there any way to keep the illusion of a mutating object with a pandas extension? https://pandas.pydata.org/docs/dev/development/extending.html
@Rinfore can you clarify your question a bit? (pandas objects are still mutable, removing the inplace keyword in methods that didn't actually work inplace won't change that)
If you want some syntactic sugar to avoid having to re-assign to the same variable (like df = df.reset_index()
instead of df.reset_index(inplace=True)
), you can't do achieve that with an extension type, but I think you could in theory do that with an accessor (it would look something like df.inplace.reset_index()
, if the accessor API allows to modify the calling object).
I am not sure if I would recommend doing that, though ;)
Thanks so much for the insight @jorisvandenbossche!
I've been working on creating some custom accessors in a domain-specific library that allow to add abstractions with embedded business logic to data frames (e.g. fictitious example: based on presence of columns: [EmployeeId, Time, Event], I can classify the data-frame as a EmployeeRecords data frame) and use extension types on it (df.employeerecords.<method>
). This allows users to treat them almost like instances of EmployeeRecords classes, but retain the flexibility to dive into the Pandas API.
In my custom accessors, I may do things such as filter out problematic rows via higher-level functions (e.g. df.employeerecords.drop_inconsistent_records()
). Previously, by dropping rows in-place, I could avoid having my users do something like df = df.employeerecords.drop_inconsistent_records()
so I was wondering if it would be possible to continue avoiding this (exactly as you mention, syntactic sugar).
I am also wondering if mutating a data frame to add columns (e.g. df['inconsistent'] = ...
), or other similar operations, would become impossible in the future, which would be of significant concern to me, as my custom accessors rely heavily on things such as mutating the data frame to add columns or mutate other state.
In summary:
self._obj
in the accessor does not replace the reference to the original object.e.g.
@pd.api.extensions.register_dataframe_accessor('test')
class TestAccessor:
def __init__(self, df: pd.DataFrame) -> None:
self._df = df
def test_assign(self):
copied_df = self._df.copy()
copied_df.columns = ['new_df']
self._df = copied_df
>>> test_df = pd.DataFrame({'old_df': [0]})
>>> test_df.test.test_assign()
>>> test_df.columns
Index(['old_df'], dtype='object') # not Index(['new_df'], dtype='object')
It appears as if the internal assignment in the custom accessor does nothing to modify the reference in the enclosing environment. Thus, I would likely need to return the new reference and have users update their variable references.
I apologise if this is not the correct forum to pose these queries, or if I have made some other error or slight in my post.
7 years since this was opened, is this still something we want to do?
This is actually covered by PDEP8, I guess we can close here
The parameter
inplace=False
should be deprecated across the board in preparation forpandas
2, which will not support that input (we will always return a copy). That would give people time to stop using it.Thoughts?
Methods using
inplace
:Deprecation non controvertial (a copy will be made anyway, and
inplace=True
does not add value):drop=False
wouldn't change the data, but that doesn't seem the main use case)Not sure:
Should be able to not copy memory (under discussion on what to do):
Special cases:
inplace=False
the value is not returned but set to an argumenttarget
)