API/DEPR: Deprecate inplace parameter

gfyoung commented 7 years ago

The parameter inplace=False should be deprecated across the board in preparation for pandas 2, which will not support that input (we will always return a copy). That would give people time to stop using it.

Thoughts?

Methods using inplace:

Deprecation non controvertial (a copy will be made anyway, and inplace=True does not add value):

(Series/DataFrame).drop
(Series/DataFrame).drop_duplicates
(Series/DataFrame).dropna
DataFrame.set_index (with drop=False wouldn't change the data, but that doesn't seem the main use case)
DataFrame.query
DataFrame.eval

Not sure:

(Series/DataFrame).sort_values
(Series/DataFrame).sort_index

Should be able to not copy memory (under discussion on what to do):

(Series/DataFrame).clip
(Series/DataFrame).where
(Series/DataFrame).fillna
(Series/DataFrame).rename_axis
(Series/DataFrame).reset_index
(Series/DataFrame).replace
(Series/DataFrame).set_axis
(Series/DataFrame).mask
(Series/DataFrame).interpolate
DataFrame.rename
Index.rename
Index.set_names
MultiIndex.set_levels
MultiIndex.set_labels
pandas.core.resample.Resampler.interpolate

Special cases:

pandas.eval (with inplace=False the value is not returned but set to an argument target)

drafter250 commented 7 years ago

I thought The main usage of this feature was to mitigate memory usage in case of large DataFrames?

gfyoung commented 7 years ago

I thought The main usage of this feature was to mitigate memory usage in case of large DataFrames?

I think @jreback might know more about the initial usage, but generally from previous discussion, this parameter is misused and is more prone to introducing bugs.

jreback commented 7 years ago

inplace does not generally do anything inplace but makes a copy and reassigns the pointer

mutating code is much harder to debug (not to mentiin more complicated to support actual inplace ops)

so except for inplace indexing generally these operations are simply more verbose and serve just to provide corner cases for testing

datapythonista commented 5 years ago

@pandas-dev/pandas-core I think this was agreed during the sprint at Scipy, but I'm not sure if it was discussed when to deprecate the inplace parameters. Is it something we want to do before 1.0?

Personally I think the sooner, the better, since the decision is made.

jreback commented 5 years ago

this is going to cause major downstream complaints, but I agree this should be sooner rather than later.

Would accept a FutureWarning for this for 0.24.0

TomAugspurger commented 5 years ago

A few questions:

There are some places (.where comes to mind, fillna? Maybe others?) where the operation can truly be done inplace. I don't think those should be deprecated.
Following 1. is it possible to detect when we're doing an "inplace" operation on a copy? Could we only warn in those cases?

jreback commented 5 years ago

There are some places (.where comes to mind, fillna? Maybe others?) where the operation can truly be done inplace. I don't think those should be deprecated.

this is a very limited number, and though possible to detect (e.g. a Series of a single dtype, or a DataFrame of a single dtype), IMHO is not worth leaving this option. just adding code complexity w/o any real perf benefit. So -1 on doing this.

datapythonista commented 5 years ago

In my opinion, ideally we should always do the operation in place if possible, but still return a new object.

So, df = df.fillna(0) wouldn't copy memory (assuming no type change), and if the user does want to modify a copy, they should do df2 = df.copy().fillna(0).

Not sure if in practice can be complex to implement in some cases. But if that makes sense, we should deprecate all inplace arguments.

jreback commented 5 years ago

So, df = df.fillna(0) wouldn't copy memory (assuming no type change), and if the user does want to modify a copy, they should do df2 = df.copy().fillna(0).

this is too complex. You either do it inplace or you don't. The keyword controls it. If we remove the keyword then it should never modify the original.

TomAugspurger commented 5 years ago

just adding code complexity w/o any real perf benefit.

It depends on the application, but not having to copy can be a pretty big win, right? :)

Still, I agree that this is tough (impossible) to detect ahead of time. Is it feasible to detect it after the fact, and raise a PerformanceWarning in those cases (possibly elevating the an error later, if that makes sense)? In a simplified BlockManager world (one block per column) we could maybe keep a weakref from the column to the actual data (ndarray or the EA's data).

def fillna(self, **kwargs):
    # in BlockManager.fillna
    ref_to_data = self._get_ref_to_data()
    result = self.apply('fillna', **kwargs)
    new_data = self._get_ref_to_data()
    if ref_to_data != new_data and inplace:
         warnings.warn(PerformanceWarning)

If so, I would prefer to go that route, rather than having users change code.

jreback commented 5 years ago

I am really -1 on this in any users code. So while this may have to be an extended deprecation cycle I think its worth it.

datapythonista commented 5 years ago

This is the list of methods functions in the public API with a parameter named inplace.

EDITED: Moved the lists to the description

Those are not really in place, and should be deprecated:

[EDIT: moved to the issue description]

If that sounds good, I'll create a PR to raise FutureWarning for the ones that we all agree should be deprecated (the first list). Then we can follow up with the rest.

Let me know if there are more from the first list that you want to postpone to the second phase.

@jorisvandenbossche may be you also want to take a look here.

h-vetinari commented 5 years ago

@datapythonista @jreback Another method that should be part of this discussion is .update, which is AFAICT the only method that's inplace by default, and does not even have an inplace-kwarg. Adding that has met resistance in #22286, but one could deprecate the whole method and replace it with df.coalesce: #22812

TomAugspurger commented 5 years ago

@datapythonista how'd you define that list of methods that are not really inplace? I haven't looked closely, but things like Series.rename_axis should be doable without copying data. I believe clip / clip_lower as well.

I am really -1 on this in any users code.

So that argument is independent of whether or not any operation can be inplace, and we should discuss that. i.e. that it's the "opinion of pandas" that inplace is an anti-pattern to be avoided at all time.

Personally, I'm not sure about that.

jreback commented 5 years ago

my point is this adds a lot of complexity sure it’s opionated but so what

complexity is killing the ability of most folks to make modifications

this simplifies the model a great deal

datapythonista commented 5 years ago

I just started the list of the actual inplace methods with the ones you said. There are some I can guess they should be able to be in place, like replace, but thought you or Jeff would know from the top of your head, so didn't want to guess.

Will move the lists to the description, and try to get it closer to reality, feel free to edit afterwards.

datapythonista commented 5 years ago

I updated the description with what I think it's non-controversial, and the rest. Let me know if there is anything that seems to be in the wrong place.

datapythonista commented 5 years ago

Is the list of the methods I defined as non-controversial, non-controversial enough so we can deprecate them already (they are listed in the description)?

I assume the rest needs some more discussion?

TomAugspurger commented 5 years ago

Sorry, haven't had a chance to look. Dumping some thoughts below.

IMO, inplace is currently overloaded for two uses: shorter code from not having to repeat the names of identifiers, and (possibly) avoiding a data copy.

Avoid typing the name of the identifier twice. We write

my_df_with_a_really_long_name.dropna(inplace=True)

versus

my_df_with_a_really_long_name = my_df_with_a_really_long_name.dropna()

In this case, inplace serves a role similar to inplace operators on immutable objects (e.g. int).

my_long_id = 5
my_long_id += 1

This inplace operation doesn't mutate the object (you can't mutate an int).

(Possibly) avoiding a copy. Some methods, like DataFrame.where, may avoid a memory copy by mutating data in place.

In [42]: arr = np.arange(12, dtype='float').reshape(4, 3)

In [43]: df = pd.DataFrame(arr, copy=False)

In [44]: np.shares_memory(df._data.blocks[0].values, arr)
Out[44]: True

In [45]: df.where(df > 5, inplace=True)

In [46]: arr
Out[46]:
array([[nan, nan, nan],
       [nan, nan, nan],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [47]: np.shares_memory(df._data.blocks[0].values, arr)
Out[47]: True

Personally, I don't really care about use-case 1. I'd recommend people chain methods.

I care a great deal about use-case 2. I'd love it if pandas could, optionally, do zero-copy operations. But this is a hard problem.

So if I were designing the API today, I would use the keyword copy=True, rather than inplace=False. Operations would never mutate data in place by default, but would provide that option when it's feasible. I likely wouldn't include the inplace keyword, and would instead instruct people to use method chaining.

But given the current situation, I might prefer having both inplace and copy. Then we can leave inplace for "update the object associated with this name", regardless of whether data is copied, and copy for "mutate data in place, and return a new dataframe that's a view on the same data." But I'm less certain about how to move forward than I am about what I would do if starting from scratch.

giuliobeseghi commented 4 years ago

@datapythonista @jreback Another method that should be part of this discussion is .update, which is AFAICT the only method that's inplace by default, and does not even have an inplace-kwarg. Adding that has met resistance in #22286, but one could deprecate the whole method and replace it with df.coalesce: #22812

In addition to what @h-vetinari said: .pop is another method that is inplace without needing any argument.

About the topic, the question we really need to ask ourselves is: is the inplace behavior worth the complexity/different logic/additional lines of code it involves?

Inplace operations have few drawbacks:

People can't use method chaining (@TomAugspurger).
Debugging is more complicated because objects change without being reassigned.

I think most people that use inplace (me too, until recently) do it because:

They think it's faster, or
They think they can save memory usage

So they might end up overusing inplace to try to improve performance, actually having little effect and sacrificing good style.

giuliobeseghi commented 4 years ago

And you probably get a real advantage with df.fillna(inplace=False) just in a few cases, as the following example shows:

import pandas as pd
import numpy as np
np.random.seed(1)

df = pd.DataFrame(np.random.rand(1000, 1000))
df = df.applymap(lambda x: x if x > 0.01 else np.nan)

df = pd.concat([df] * 15, ignore_index=True)
df = pd.concat([df] * 15, axis=1, ignore_index=True)

# 15000x15000 df, 1716 MB

_df = df.copy()

%%timeit -n 1 -r 1
_df.fillna(1, inplace=True)

2.73 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

%%timeit -n 1 -r 1
res = df.fillna(1)

9.89 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

TomAugspurger commented 4 years ago

I assume this isn't happening for 1.0, so pushing off this milestone.

giuliobeseghi commented 4 years ago

Shall we still set inplace=False as a default for set_axis in pandas 1.0? We probably need a major release to implement this relatively disruptive change, and the future warning has been around for a while.

simonjayhawkins commented 4 years ago

this was done in #27600

Ddedalus commented 4 years ago

I'm surprised this has not been brought up in this thread yet: the inplace kwarg makes it much more difficult to do any static type-checking. Every method with this kwarg ends up with return type of Union[DataFrame, NoneType] or Union[Series, NoneType], so the type checker has no way of knowing whether anything was returned at all!

Moreover, the existence of inplace causes problems with auto-complete in IDEs. For example with VS Code + Python extension + Microsoft Language Server one gets:

Correctly inferred type and autocomplete when using the constructor which unambiguously returns a DataFrame
Still correct inference even if inplace was used in meantime (the static analysis assumes whatever was returned by the reset_index() call was discarded anyways):
Complete confusion and no autocomplete when reset_index() was chained with the constructor:

If you think about it, there is no way type hints or any static code analysis will ever solve this correctly (unless it specifically looks for the inplace kwarg). They may just assume the result is not None, but this kind of defeats the point of type checking. I believe this compromises a lot of effort already put into providing type hints in pandas.

rhshadrach commented 4 years ago

@Ddedalus - you can type-hint with @overload

SamRWest commented 4 years ago

I understand that inplace doesn't always reduce copying, but if it's removed, I'd really like to see some other way of doing inplace operations where possible.

Pandas is already quite memory hungry due to its copy-by-default conventions, but the ability to mitigate this using inplace=True in certain situations is invaluable. For instance, I have some production code which runs out of memory when doing: large_df = large_df.join(small_df) Which copies the dataframe and doubles the peak memory usage temporarily.

But I can currently work around this by using: large_df.update(small_df) which updates large_df inplace and effectively halves my application's memory usage.

I suspect there are many other examples where peak RAM is a major issue, and removing inplace from all functions will be the deal-breaker which forces users to another platform.

Data is only getting bigger, and RAM (particularly in the cloud) is expensive, so please don't hamstring Pandas by removing the few memory optimisations it currently provides!

jreback commented 4 years ago

@SamRWest you are missing the point, inplace rarely actually does something inplace, you are thinking that you are saving memory but you are not.

toobaz commented 4 years ago

@SamRWest you are missing the point, inplace rarely actually does something inplace, you are thinking that you are saving memory but you are not.

I think he was really defending inplace operations rather than the inplace argument. His use of update does save memory - but update itself has no inplace kwarg.

SamRWest commented 4 years ago

@jreback I understand that there needs to be specific conditions for inplace to actually work. I've profiled my update() call, and it's definitely halving memory compared to a standard join(), presumably because my dataframe's dytpes are all floats. My point was that this (dataframe of a single dtype) is an extremely common case in data science work, which I believe is one of Pandas main use-cases. By removing explicit and/or inherent inplace calls, you're removing my ability to do memory optimisation which halve my peak RAM use. When you're working with 50Gb+ dataframes, this is a big deal.

@toobaz yep - there was some discussion above of whether the existing few inherently inplace operations should remain. I was giving an example where they are very useful to have, but also trying to make a case that other inplace operations are still extremely useful. For example, it seems crazy to me to remove an optimisation for, say fillna(), which halves the overall peak memory usage.

Now I don't understand the complexity of keeping the inplace param and functions, but if you could keep them, and instead educate users about when they don't work (using a warning message for mixed dtype dataframes, for example), I would find this far more useful.

nalzok commented 4 years ago

inplace rarely actually does something inplace

@toobaz But why not? In other words, what is preventing methods with inplace=True from getting re-implemented at some point to use truly in-place operation? If the reason is some technical difficulty or lack of manpower, maybe we should leave the option as-is instead of eliminating the possibility of future improvements.

jbrockmendel commented 4 years ago

maybe we should leave the option as-is instead of eliminating the possibility of future improvements.

If an actually-inplace implementation becomes available, nothing prevents us from re-introducing the keyword in the future.

ivirshup commented 4 years ago

I like the inplace parameter, and think it's fine that it can provide performance benefits – but does not promise them. It's also nice that I can write code which modifies a referenced dataframe without having to make sure an updated copy gets assigned back.

A number of the methods which use inplace can also use python syntax for inplace operations. It's not clear to me why df.drop("col", inplace=True) is bad, but del df["col"] is fine. Another example would be:

df.set_index("col", inplace=True)
# vs
df.index = df["col"]
del df["col"]

or

df.fillna(0, inplace=True)
# vs
df[np.isnan(df)] = 0

I agree that the block manager can make memory usage pretty opaque to users, but I'm inclined to think that's an internals problem instead of an API problem.

As a side note: @Ddedalus, could your issue be solved by just having inplace not change the return type? I.e. df.drop("something", inplace=True) could just return a reference to df?

I personally think this would be better behavior than changing the return type based on a parameter value. It would also be great to be able to chain inplace operations. I frequently find myself un-chain-ing code since dropping nuisance columns or converting a column's dtype would cause a full copy of a large dataframe.

simonjayhawkins commented 4 years ago

As a side note: @Ddedalus, could your issue be solved by just having inplace not change the return type? I.e. df.drop("something", inplace=True) could just return a reference to df?

as per @rhshadrach comment https://github.com/pandas-dev/pandas/issues/16529#issuecomment-655774067, typing is not really an issue. (We need to be able to use Literal though so cannot resolve while supporting py 3.7 without typing_extensions)

toobaz commented 4 years ago

maybe we should leave the option as-is instead of eliminating the possibility of future improvements.

If an actually-inplace implementation becomes available, nothing prevents us from re-introducing the keyword in the future.

... I agree, but I think that even better, nothing prevents us from using (variations of) copy=False. Again, we can evaluate API decisions regardless of what the implementation supports or will eventually support: we have decided long ago that the inplace=True is not an API we like, and this will not stop us from supporting operations that do not make copies of data where possible.

ivirshup commented 4 years ago

@simonjayhawkins Not to get too off topic into the discussion of typing, but it seems like this might make it more difficult for the type checker/ end up with poor results. For example:

def foo(df: pd.DataFrame, inplace: bool):
    if inplace:
        df.some_method(inplace=inplace)
    else:
        df = df.some_method(inplace=inplace)
    return df

Unless the type checker can evaluate the flow control (which is probably out of scope for a type checker, and definitely out of scope for more complex flow control), it wouldn't be able to tell the result is always a DataFrame.

and this will not stop us from supporting operations that do not make copies of data where possible.

@toobaz I think this point is confusing me a bit here. From @jreback's comments, it sounded to me like the issue from the pandas team's perspective was that operations could happen in place. But this conflicts a bit with @TomAugspurger's suggestion (https://github.com/pandas-dev/pandas/issues/16529#issuecomment-451001804) which keeps the argument, and isn't deprecation of inplace.

Is there more discussion about the reasoning behind this issue somewhere else?

but I think that even better, nothing prevents us from using (variations of) copy=False.

I'm not sure I would agree that copy is a better argument than inplace. I like that methods with copy will return something, so they can be chained regardless of whether mutation occurred. I don't like that it creates a new reference. This leads to multiple objects – that could "look" very different (e.g. dtypes, shapes) – which might partially reference the same memory. This ambiguity seems like it could cause confusion/ introduce bugs to me.

That said, there are other places memory sharing occurs, and might be fine. For example:

a = pd.DataFrame(np.ones((10, 10)))
b = pd.DataFrame(a)
b.iloc[0, 0] = 0
assert a.iloc[0, 0] == 0

I guess I just think the number of objects which could modify the same memory should be kept low. To me, this is not what the copy argument encourages.

toobaz commented 4 years ago

From @jreback's comments, it sounded to me like the issue from the pandas team's perspective was that operations could happen in place.

Sorry, I admit that I didn't follow the entire discussion. My point is only that API (inplace=) and implementation (operations that do not copy the data) are relatively orthogonal - sorry if this was obvious at this point. Then I'm in favor of dropping the first and allowing, maybe even increasing, the use of the latter, but it's my opinion, possibly surpassed by the discussion.

I'm not sure I would agree that copy is a better argument than inplace. I like that methods with copy will return something, so they can be chained regardless of whether mutation occurred. I don't like that it creates a new reference. This leads to multiple objects – that could "look" very different (e.g. dtypes, shapes) – which might partially reference the same memory. This ambiguity seems like it could cause confusion/ introduce bugs to me.

I see your point, but I don't see a solution. If we agree that chaining is a nice way to structure the code, then copy seems like the way to go (and if we want a less crowded API, inplace is one too much). This does mean you need to be aware of possible multiple references, but it's the price to pay for efficiency (otherwise, just stick to the default copy=True).

jorisvandenbossche commented 4 years ago

I think this point is confusing me a bit here. ... But this conflicts a bit with @TomAugspurger's suggestion (#16529 (comment)) which keeps the argument, and isn't deprecation of inplace.

@ivirshup To be clear, there is not yet a "final" decision on all aspects of this (which migth add to your confusion ..). The comment of Tom you reference is I think the most complete proposal-like comment up to now, and indeed in that comment Tom questions whether we should actually deprecate inplace or not, if we add a copy argument (but note that Tom also says he is uncertain about that part).

I think there is mostly agreement about the analysis of that summary post of Tom: there are currently two use cases for inplace (updating the object in-place vs actually not copying by mutating data in-place), and we would like to disentangle those two use cases (also to make the no-copy variant easier to use, also in method chaining).

I'm not sure I would agree that copy is a better argument than inplace. ... I don't like that it creates a new reference. This leads to multiple objects – that could "look" very different (e.g. dtypes, shapes) – which might partially reference the same memory. This ambiguity seems like it could cause confusion/ introduce bugs to me.

BTW, this also relates somewhat to the copy/view discussion at https://github.com/pandas-dev/pandas/pull/36988 and https://github.com/pandas-dev/pandas/issues/36195 (since in that discussion, the idea of doing such copy-on-write also in methods is mentioned). Not having all this defensive copying in methods would also help in reducing memory usage (something people now also use inplace=True for). Specifically related to your comment above, the default behaviour for the copy keyword could be to not copy the data if needed, but without being an actual view (i.e. not updating the original when the returned dataframe is modified) though copy-on-write. So the default (let's say copy=None) would create a new object, but it referencing the same memory is then only an implementation detail. Only with an explicit copy=False specified by the user, you would get a new reference to the same memory.

Ddedalus commented 4 years ago

In terms of making typing more predictive and the API less confusing, why not consider a 'middleman' attribute access:

df = df.some_method()
df.inplace.some_method()

This pattern is often used to group methods in pandas, like DataFrame.str methods.

toobaz commented 4 years ago

df.inplace.some_method()

@Ddedalus Are you thinking about .inplace affecting the behavior of some_method (e.g. not copying data), or just wrapping it and providing syntactic sugar? In the first case, my impression is that the added complexity is not worth it (the implementation and the user need to know which methods support inplace - in the end, just using the kwarg is simpler). In the second case, it might be a neat idea to provide the "inplace paradigm" with few lines of codes once and forall, but I would call it differently from .inplace, again, to clarify we are not talking about (not) copying data.

Edit: in the second case, one would need to reassign to self all attributes of the new object created by some_method. Not sure it's as easy as I thought.

ivirshup commented 4 years ago

My point is only that API (inplace=) and implementation (operations that do not copy the data) are relatively orthogonal

@toobaz, I agree with this. But I think they can be exposed as the same concept though. To a large extent, this is "pythonic". I.e. a.sort(), a[b] = c, a += b, etc.. These are all inplace (the referenced object is modified) and data is not copied (where possible).

I see your point, but I don't see a solution. If we agree that chaining is a nice way to structure the code, then copy seems like the way to go (and if we want a less crowded API, inplace is one too much). This does mean you need to be aware of possible multiple references

I think a possible solution here is to say: if the method is mutating, and you want that operation to happen in place, you return a reference to the passed object. Let's say the argument for this behaviour is called foo (I think inplace would work for this, though it would be a behavior change).

df = pd.DataFrame(...)
df2 = df.sort_values(some_col, foo=True)
assert df2 is df

This is similar to how Julia does mutating operations on arrays: a = [3, 2, 1]; b = sort!(a); a === b.

I think this simplifies the question of "who owns this memory" quite a bit, since it's kept within a single reference for these operations. I think this could also be a solution to the "two use cases"

(updating the object in-place vs actually not copying by mutating data in-place)

by not differentiating between the two. I don't think anything useful is lost by saying one argument controls both of these. If this same argument also doesn't prevent chaining, then I don't see a downside. Notably, this is also very simple to implement.

@jorisvandenbossche I like the idea of copy on write, but I think it can be difficult to implement and could lead to unintuitive performance. As a point of reference, a project that I work on has been trying to do copy-on-write, and it has proved difficult. The main barrier is probably that there's a ton to implement. I might comment more on this in the linked threads.

About the difficulty of implementation, I really think the best level to implement copy on write is at the language level. I think you really want to be able to prove that some memory could not be modified when shared, which just isn't possible in such a procedural language which frequently calls to external binaries.

As a performance example, is all indexing done with copy on write? Let's say we define df2 = df.iloc[np.argsort(df[col])]. Now, df2.iloc[:50] is much more expensive than if we copied, since the first 50 elements are no longer contiguous in memory.

MarcoGorelli commented 3 years ago

Here's my suggestion:

for methods from the "Deprecation non controvertial" section of the description, add a FutureWarning to 1.3 that inplace will be removed, and then remove it in the next non-patch release (probably 2.0.0).
for the other methods in the description, make arguments with defaults keyword-only - to do this, add a FutureWarning to 1.3 using deprecate_nonkeyword_arguments. Then, only 3 overloads will ever be needed to statically type the return argument correctly (as opposed to the 38 currently needed for drop...)

I understand that there isn't much time left to get these into 1.3, so it would be good if folks could express agreement/disagreement with the above - if there's agreement then I'll power through them at a sprint on Saturday and on Friday at my company's "tech 10%" scheme

jbrockmendel commented 3 years ago

Seems reasonable

phofl commented 3 years ago

Seems reasonable to me too

jorisvandenbossche commented 3 years ago

I don't know what has been discussed on Wednesday, but based on my initial experiments with Copy-on-Write (https://github.com/pandas-dev/pandas/issues/36195, no PR yet), I am no longer fully convinced that completely deprecating the inplace keyword is the way to go.

One of the ideas mentioned in this long thread was to replace it with or add a copy keyword (https://github.com/pandas-dev/pandas/issues/16529#issuecomment-451001804):

copy for "mutate data in place, and return a new dataframe that's a view on the same data."

I don't think this will actually be possible to reliably return a new dataframe viewing the same data (in the sense of being an actual view that will reflect mutations, and not a copy-on-write view). I think we can only return an actual no-copy version by returning the same dataframe object. But this is then essentially changing the dataframe "in place". In which case the inplace keyword might be the better name .. ?

Now, the above discussion is only relevant for methods that can potentially be done in place. But, for a potential ArrayManager, the "non controversial" list of the top post is not fully accurate, I think. For example drop(columns=..) can still be done in place, or set_index as well.

MarcoGorelli commented 3 years ago

Thanks for your input - OK, let's not rush to deprecate things then, especially if the list above is not longer accurate.

If it's not controversial, then I'll just try to get these deprecate_nonkeyword_arguments FutureWarnings in for 1.3 - this won't be a big user-facing change as people tend to specify inplace and other such arguments by name anyway (but mypydoesn't know that 😄 ), yet it does make a big difference to static typing as it means methods can have correct return types without requiring a flood of overloads

jorisvandenbossche commented 3 years ago

Yes, deprecating positional in favor of keyword-only arguments sounds certainly good

rhshadrach commented 3 years ago

Is changing the behavior for all methods with an inplace argument so that they return their instance non-controversial, or are there any downsides to doing that? I think it can be done independent of any future removals. The benefits are

Ability to use in inplace with method-chaining
No need for type-hinting overloads (at least, for the return type)

Some methods already do this. Looking through the discussion above, I don't see mention of downsides to this. Would need a FutureWarning.

jreback commented 3 years ago

-1 on returning the instance for inplace

method chaining with in place is a non pattern and should be always avoided as silently can fail

+1 in simply depreciating inplace entirely

ntachukwu commented 2 years ago

I will like to make a contribution to this issue. Where does the community need help to deprecate inplace?

pandas-dev / pandas

API/DEPR: Deprecate inplace parameter #16529