Open Dr-Irv opened 2 months ago
In general, as long as the order is not important for the result, I'm positive on allowing sets. But not strongly so.
The original method given by drop via pandas package mentions of the following internal type formats.
(method)
def drop(
labels: None = ...,
*,
axis: Axis = ...,
index: Hashable | Sequence[Hashable] | Index[Any] = ...,
columns: Hashable | Sequence[Hashable] | Index[Any],
level: Level | None = ...,
inplace: Literal[True],
errors: IgnoreRaise = ...
) -> None: ...
def drop(
labels: None = ...,
*,
axis: Axis = ...,
index: Hashable | Sequence[Hashable] | Index[Any],
columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
level: Level | None = ...,
inplace: Literal[True],
errors: IgnoreRaise = ...
) -> None: ...
def drop(
labels: Hashable | Sequence[Hashable] | Index[Any],
*,
axis: Axis = ...,
index: None = ...,
columns: None = ...,
level: Level | None = ...,
inplace: Literal[True],
errors: IgnoreRaise = ...
) -> None: ...
def drop(
labels: None = ...,
*,
axis: Axis = ...,
index: Hashable | Sequence[Hashable] | Index[Any] = ...,
columns: Hashable | Sequence[Hashable] | Index[Any],
level: Level | None = ...,
inplace: Literal[False] = ...,
errors: IgnoreRaise = ...
) -> DataFrame: ...
def drop(
labels: None = ...,
*,
axis: Axis = ...,
index: Hashable | Sequence[Hashable] | Index[Any],
columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
level: Level | None = ...,
inplace: Literal[False] = ...,
errors: IgnoreRaise = ...
) -> DataFrame: ...
def drop(
labels: Hashable | Sequence[Hashable] | Index[Any],
*,
axis: Axis = ...,
index: None = ...,
columns: None = ...,
level: Level | None = ...,
inplace: Literal[False] = ...,
errors: IgnoreRaise = ...
) -> DataFrame: ...
def drop(
labels: None = ...,
*,
axis: Axis = ...,
index: Hashable | Sequence[Hashable] | Index[Any] = ...,
columns: Hashable | Sequence[Hashable] | Index[Any],
level: Level | None = ...,
inplace: _bool = ...,
errors: IgnoreRaise = ...
) -> (DataFrame | None): ...
def drop(
labels: None = ...,
*,
axis: Axis = ...,
index: Hashable | Sequence[Hashable] | Index[Any],
columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
level: Level | None = ...,
inplace: _bool = ...,
errors: IgnoreRaise = ...
) -> (DataFrame | None): ...
def drop(
labels: Hashable | Sequence[Hashable] | Index[Any],
*,
axis: Axis = ...,
index: None = ...,
columns: None = ...,
level: Level | None = ...,
inplace: _bool = ...,
errors: IgnoreRaise = ...
) -> (DataFrame | None): ...
drop function supports this many combination of types during operations. All the functions are same. Documentation is less descriptive about this thing.
drop function supports this many combination of types during operations. All the functions are same. Documentation is less descriptive about this thing.
yes and no. The type declarations for any of index
, columns
and labels
are all allowing any of these:
Hashable
Hashable
Index
The docs currently say "single label or list-like" for those 3. A set
is not "list-like", nor would a set
pass those type declarations.
But at runtime, a set
is supported. So the question is whether we change the type declarations and the docs, OR do we disallow a set
at runtime (since the type declarations already disallow it).
In the specific case of labels
in drop
, I think using a set
or, in general, collections.abc.Collection
is fine, since the result should not depend on the order of the labels. Maybe even collections.abc.Iterable
should be fine.
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
Documentation problem
The arguments
labels
,index
andcolumns
are documented as "single label or list-like". But a set is accepted:The pandas source declaration in the typing declarations does not allow a set to be passed.
Suggested fix for documentation
Unclear.
Either we update the docs to say a set is allowed (and update the internal types), OR we add a check to see if a set is passed and raise an exception.
First raised as a
pandas-stubs
issue in https://github.com/pandas-dev/pandas-stubs/issues/1008