pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
230 stars 123 forks source link

DataFrame.drop with columns=set(...) is unspecified #1008

Open cmp0xff opened 6 days ago

cmp0xff commented 6 days ago

Describe the bug

DataFrame.drop with columns=set(...) is unspecified.

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.

    import pandas as pd
    
    df = pd.DataFrame({1: [2], 3: [4]})
    df = df.drop(columns={1})
  2. I am using mypy type checker.
  3. The error message received from that type checker.
    df_drop.py:4:6: error: No overload variant of "drop" of "NDFrame" matches argument type "set[int]"  [call-overload]
    df_drop.py:4:6: note: Possible overload variants:
    df_drop.py:4:6: note:     def drop(self, labels: None = ..., *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: Hashable | Sequence[Hashable] | Index[Any] = ..., columns: Hashable | Sequence[Hashable] | Index[Any], level: Hashable | int | None = ..., inplace: Literal[True], errors: Literal['ignore', 'raise'] = ...) -> None
    df_drop.py:4:6: note:     def drop(self, labels: None = ..., *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: Hashable | Sequence[Hashable] | Index[Any], columns: Hashable | Sequence[Hashable] | Index[Any] = ..., level: Hashable | int | None = ..., inplace: Literal[True], errors: Literal['ignore', 'raise'] = ...) -> None
    df_drop.py:4:6: note:     def drop(self, labels: Hashable | Sequence[Hashable] | Index[Any], *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: None = ..., columns: None = ..., level: Hashable | int 
    | None = ..., inplace: Literal[True], errors: Literal['ignore', 'raise'] = ...) -> None
    df_drop.py:4:6: note:     def drop(self, labels: None = ..., *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: Hashable | Sequence[Hashable] | Index[Any] = ..., columns: Hashable | Sequence[Hashable] | Index[Any], level: Hashable | int | None = ..., inplace: Literal[False] = ..., errors: Literal['ignore', 'raise'] = ...) -> DataFrame
    df_drop.py:4:6: note:     def drop(self, labels: None = ..., *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: Hashable | Sequence[Hashable] | Index[Any], columns: Hashable | Sequence[Hashable] | Index[Any] = ..., level: Hashable | int | None = ..., inplace: Literal[False] = ..., errors: Literal['ignore', 'raise'] = ...) -> DataFrame
    df_drop.py:4:6: note:     def drop(self, labels: Hashable | Sequence[Hashable] | Index[Any], *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: None = ..., columns: None = ..., level: Hashable | int 
    | None = ..., inplace: Literal[False] = ..., errors: Literal['ignore', 'raise'] = ...) -> DataFrame
    df_drop.py:4:6: note:     def drop(self, labels: None = ..., *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: Hashable | Sequence[Hashable] | Index[Any] = ..., columns: Hashable | Sequence[Hashable] | Index[Any], level: Hashable | int | None = ..., inplace: bool = ..., errors: Literal['ignore', 'raise'] = ...) -> DataFrame | None
    df_drop.py:4:6: note:     def drop(self, labels: None = ..., *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: Hashable | Sequence[Hashable] | Index[Any], columns: Hashable | Sequence[Hashable] | Index[Any] = ..., level: Hashable | int | None = ..., inplace: bool = ..., errors: Literal['ignore', 'raise'] = ...) -> DataFrame | None
    df_drop.py:4:6: note:     def drop(self, labels: Hashable | Sequence[Hashable] | Index[Any], *, axis: Literal['index', 0] | Literal['columns', 1] = ..., index: None = ..., columns: None = ..., level: Hashable | int 
    | None = ..., inplace: bool = ..., errors: Literal['ignore', 'raise'] = ...) -> DataFrame | None
    Found 1 error in 1 file (checked 1 source file)

Please complete the following information

Additional context

Nope

Dr-Irv commented 6 days ago

First, your example is not correct. It should be:

import pandas as pd

df = pd.DataFrame({1: [2], 3: [4]})   # Fix is here
df = df.drop(columns={1})

Secondly, the pandas documentation says that the argument for columns is "single label or list-like". While your code works, it is not clear that it should. The stubs follows what is documented and a set is not "list-like".

I've added a reference to a pandas issue https://github.com/pandas-dev/pandas/issues/59890 to see what the pandas developers say there.