pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.94k stars 18.04k forks source link

DOC: Should `DataFrame.drop()` accept a `set` for the `columns` , `index` and `labels` argument? #59890

Open Dr-Irv opened 2 months ago

Dr-Irv commented 2 months ago

Pandas version checks

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Documentation problem

The arguments labels, index and columns are documented as "single label or list-like". But a set is accepted:

import pandas as pd

df = pd.DataFrame({1: [2], 3: [4]})   # Fix is here
df = df.drop(columns={1})

The pandas source declaration in the typing declarations does not allow a set to be passed.

Suggested fix for documentation

Unclear.

Either we update the docs to say a set is allowed (and update the internal types), OR we add a check to see if a set is passed and raise an exception.

First raised as a pandas-stubs issue in https://github.com/pandas-dev/pandas-stubs/issues/1008

rhshadrach commented 2 months ago

In general, as long as the order is not important for the result, I'm positive on allowing sets. But not strongly so.

AnuraagaNath commented 2 months ago

The original method given by drop via pandas package mentions of the following internal type formats.

(method)
def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any] = ...,
    columns: Hashable | Sequence[Hashable] | Index[Any],
    level: Level | None = ...,
    inplace: Literal[True],
    errors: IgnoreRaise = ...
) -> None: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any],
    columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
    level: Level | None = ...,
    inplace: Literal[True],
    errors: IgnoreRaise = ...
) -> None: ...

def drop(
    labels: Hashable | Sequence[Hashable] | Index[Any],
    *,
    axis: Axis = ...,
    index: None = ...,
    columns: None = ...,
    level: Level | None = ...,
    inplace: Literal[True],
    errors: IgnoreRaise = ...
) -> None: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any] = ...,
    columns: Hashable | Sequence[Hashable] | Index[Any],
    level: Level | None = ...,
    inplace: Literal[False] = ...,
    errors: IgnoreRaise = ...
) -> DataFrame: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any],
    columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
    level: Level | None = ...,
    inplace: Literal[False] = ...,
    errors: IgnoreRaise = ...
) -> DataFrame: ...

def drop(
    labels: Hashable | Sequence[Hashable] | Index[Any],
    *,
    axis: Axis = ...,
    index: None = ...,
    columns: None = ...,
    level: Level | None = ...,
    inplace: Literal[False] = ...,
    errors: IgnoreRaise = ...
) -> DataFrame: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any] = ...,
    columns: Hashable | Sequence[Hashable] | Index[Any],
    level: Level | None = ...,
    inplace: _bool = ...,
    errors: IgnoreRaise = ...
) -> (DataFrame | None): ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any],
    columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
    level: Level | None = ...,
    inplace: _bool = ...,
    errors: IgnoreRaise = ...
) -> (DataFrame | None): ...

def drop(
    labels: Hashable | Sequence[Hashable] | Index[Any],
    *,
    axis: Axis = ...,
    index: None = ...,
    columns: None = ...,
    level: Level | None = ...,
    inplace: _bool = ...,
    errors: IgnoreRaise = ...
) -> (DataFrame | None): ...

drop function supports this many combination of types during operations. All the functions are same. Documentation is less descriptive about this thing.

Dr-Irv commented 2 months ago

drop function supports this many combination of types during operations. All the functions are same. Documentation is less descriptive about this thing.

yes and no. The type declarations for any of index, columns and labels are all allowing any of these:

The docs currently say "single label or list-like" for those 3. A set is not "list-like", nor would a set pass those type declarations.

But at runtime, a set is supported. So the question is whether we change the type declarations and the docs, OR do we disallow a set at runtime (since the type declarations already disallow it).

cmp0xff commented 2 weeks ago

In the specific case of labels in drop, I think using a set or, in general, collections.abc.Collection is fine, since the result should not depend on the order of the labels. Maybe even collections.abc.Iterable should be fine.