narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!
https://narwhals-dev.github.io/narwhals/
MIT License
303 stars 42 forks source link

[Bug]: Pandas version incorrectly set in extreme CICD #440

Closed ELC closed 1 month ago

ELC commented 1 month ago

Describe the bug

The extreme CICD is not using the oldest supported Pandas version (0.25.3) when testing the whole codebase, thus giving a misleading 100% coveraged and passed result.

Steps or code to reproduce the bug

Add the following step to the pretty_old_versions job:

      - name: Run doctests
        run: pytest narwhals --doctest-modules

Moreover the job pretty_old_versions is set to use pandas 1.1.5 instead of 0.25.3

Expected results

The CICD should pass without errors

Actual results

These are some of the errors I see when running test locally on latest version of main:

UNEXPECTED EXCEPTION: AttributeError("'DataFrame' object has no attribute 'convert_dtypes'")
Traceback (most recent call last):
  File "/usr/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest narwhals.utils.maybe_convert_dtypes[6]>", line 1, in <module>
  File "/workspaces/narwhals/narwhals/utils.py", line 261, in maybe_convert_dtypes
    df_any._dataframe._dataframe.convert_dtypes(*args, **kwargs)
  File "/workspaces/narwhals/.nox/minimum_versions/lib/python3.8/site-packages/pandas/core/generic.py", line 5179, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'convert_dtypes'
/workspaces/narwhals/narwhals/utils.py:249: UnexpectedException
UNEXPECTED EXCEPTION: ImportError('pyarrow requires pandas 1.0.0 or above, pandas 0.25.3 is installed')
Traceback (most recent call last):
  File "/usr/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest narwhals.series.Series.to_pandas[8]>", line 1, in <module>
  File "/workspaces/narwhals/narwhals/translate.py", line 400, in wrapper
    result = func(*args, **kwargs)
  File "<doctest narwhals.series.Series.to_pandas[6]>", line 3, in func
  File "/workspaces/narwhals/narwhals/series.py", line 1209, in to_pandas
    return self._series.to_pandas()
  File "/workspaces/narwhals/.nox/minimum_versions/lib/python3.8/site-packages/polars/series/series.py", line 4434, in to_pandas
    else self.to_arrow().to_pandas(**kwargs)
  File "pyarrow/array.pxi", line 830, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow/array.pxi", line 1426, in pyarrow.lib.Array._to_pandas
  File "pyarrow/array.pxi", line 1649, in pyarrow.lib._array_like_to_pandas
  File "pyarrow/pandas-shim.pxi", line 103, in pyarrow.lib._PandasAPIShim.series
  File "pyarrow/pandas-shim.pxi", line 96, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow/pandas-shim.pxi", line 65, in pyarrow.lib._PandasAPIShim._import_pandas
ImportError: pyarrow requires pandas 1.0.0 or above, pandas 0.25.3 is installed
/workspaces/narwhals/narwhals/series.py:1203: UnexpectedException
______________________________________________ [doctest] narwhals.expression.ExprDateTimeNamespace.total_seconds ______________________________________________
2635 
2636             We define a dataframe-agnostic function:
2637 
2638             >>> @nw.narwhalify
2639             ... def func(df):
2640             ...     return df.with_columns(a_total_seconds=nw.col("a").dt.total_seconds())
2641 
2642             We can then pass either pandas or Polars to `func`:
2643 
2644             >>> func(df_pd)
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
    -                       a  a_total_seconds
    -0        0 days 00:00:10               10
    -1 0 days 00:00:20.040000               20
    +                a  a_total_seconds
    +0        00:00:10               10
    +1 00:00:20.040000               20

/workspaces/narwhals/narwhals/expression.py:2644: DocTestFailure

They all appear when running doctest with minimum versions. However, CICD passes just fine, so I am concerned there is a misconfiguration somewhere, probably missing a couple of skipifs. The error clearly states that Pyarrow requires pandas >=1.0.0

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.10.13 (main, May 30 2024, 20:38:07) [GCC 9.4.0]
executable: /home/codespace/.local/share/virtualenvs/narwhals-fIT6KdJt/bin/python
   machine: Linux-6.5.0-1022-azure-x86_64-with-glibc2.31

Python dependencies:
     narwhals: 1.0.2
       pandas: 2.2.2
       polars: 1.0.0
         cudf: 
        modin: 
      pyarrow: 16.1.0
        numpy: 2.0.0

**NOTE**: This is a CICD error so running versions provide no useful information

Relevant log output

See the Actual Results section
MarcoGorelli commented 1 month ago

thanks for the report - it's true, doctests are skipped for the minimum versions, and are tested with the newer versions. I don't think this is a big deal, they're just meant for documentation anyway, and we need some check to ensure they don't go out of sync

MarcoGorelli commented 1 month ago

closing then, as this is expected, but thanks for the report