narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!
https://narwhals-dev.github.io/narwhals/
MIT License
592 stars 89 forks source link

feat: better error message for duplicate column names in pandas #1270

Closed MarcoGorelli closed 2 weeks ago

MarcoGorelli commented 2 weeks ago

This should help with https://github.com/Quantco/glum/pull/868

Demo:

In [1]: df
Out[1]: 
   a  a  a  b  b  c
a  1  2  3  4  5  6

In [2]: nw.from_native(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 nw.from_native(df)

File ~/polars-api-compat-dev/narwhals/stable/v1/__init__.py:833, in from_native(native_dataframe, strict, eager_only, eager_or_interchange_only, series_only, allow_series)
    831 if isinstance(native_dataframe, Series) and (series_only or allow_series):
    832     return native_dataframe
--> 833 result = _from_native_impl(
    834     native_dataframe,
    835     strict=strict,
    836     eager_only=eager_only,
    837     eager_or_interchange_only=eager_or_interchange_only,
    838     series_only=series_only,
    839     allow_series=allow_series,
    840     dtypes=dtypes,  # type: ignore[arg-type]
    841 )
    842 return _stableify(result)

File ~/polars-api-compat-dev/narwhals/translate.py:475, in _from_native_impl(native_object, strict, eager_only, eager_or_interchange_only, series_only, allow_series, dtypes)
    472         raise TypeError(msg)
    473     pd = get_pandas()
    474     return DataFrame(
--> 475         PandasLikeDataFrame(
    476             native_object,
    477             backend_version=parse_version(pd.__version__),
    478             implementation=Implementation.PANDAS,
    479             dtypes=dtypes,
    480         ),
    481         level="full",
    482     )
    483 elif is_pandas_series(native_object):
    484     if not allow_series:

File ~/polars-api-compat-dev/narwhals/_pandas_like/dataframe.py:51, in PandasLikeDataFrame.__init__(self, native_dataframe, implementation, backend_version, dtypes)
     43 def __init__(
     44     self,
     45     native_dataframe: Any,
   (...)
     49     dtypes: DTypes,
     50 ) -> None:
---> 51     self._validate_columns(native_dataframe.columns)
     52     self._native_frame = native_dataframe
     53     self._implementation = implementation

File ~/polars-api-compat-dev/narwhals/_pandas_like/dataframe.py:100, in PandasLikeDataFrame._validate_columns(self, columns)
     98         msg += f"\n- '{key}' {value} times"
     99 msg = f"Expected unique column names, got:{msg}"
--> 100 raise ValueError(msg)

ValueError: Expected unique column names, got:
- 'a' 3 times
- 'b' 2 times

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.