pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.47k stars 1.87k forks source link

Example python API code in Join section of Getting started guide raises TypeError #19002

Closed conorhamill36 closed 1 hour ago

conorhamill36 commented 2 hours ago

Checks

Reproducible example

import polars as pl import numpy as np

df = pl.DataFrame( { "a": range(8), "b": np.random.rand(8), "d": [1.0, 2.0, float("nan"), float("nan"), 0.0, -5.0, -42.0, None], } )

Log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[65], line 1
----> 1 df = pl.DataFrame(
      2     {
      3         "a": range(8),
      4         "b": np.random.rand(8),
      5         "d": [1.0, 2.0, float("nan"), float("nan"), 0.0, -5.0, -42.0, None],
      6     }
      7 )

File ~/miniconda3/envs/polars-env/lib/python3.12/site-packages/polars/dataframe/frame.py:361, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
    356     self._df = dict_to_pydf(
    357         {}, schema=schema, schema_overrides=schema_overrides
    358     )
    360 elif isinstance(data, dict):
--> 361     self._df = dict_to_pydf(
    362         data,
    363         schema=schema,
    364         schema_overrides=schema_overrides,
    365         strict=strict,
    366         nan_to_null=nan_to_null,
    367     )
    369 elif isinstance(data, (list, tuple, Sequence)):
    370     self._df = sequence_to_pydf(
    371         data,
    372         schema=schema,
   (...)
    376         infer_schema_length=infer_schema_length,
    377     )

File ~/miniconda3/envs/polars-env/lib/python3.12/site-packages/polars/_utils/construction/dataframe.py:162, in dict_to_pydf(data, schema, schema_overrides, strict, nan_to_null, allow_multithreaded)
    149     data_series = [
    150         pl.Series(
    151             name,
   (...)
    157         for name in column_names
    158     ]
    159 else:
    160     data_series = [
    161         s._s
--> 162         for s in _expand_dict_values(
    163             data,
    164             schema_overrides=schema_overrides,
    165             strict=strict,
    166             nan_to_null=nan_to_null,
    167         ).values()
    168     ]
    170 data_series = _handle_columns_arg(data_series, columns=column_names, from_dict=True)
    171 pydf = PyDataFrame(data_series)

File ~/miniconda3/envs/polars-env/lib/python3.12/site-packages/polars/_utils/construction/dataframe.py:391, in _expand_dict_values(data, schema_overrides, strict, order, nan_to_null)
    388     updated_data[name] = s
    390 elif arrlen(val) is not None or _is_generator(val):
--> 391     updated_data[name] = pl.Series(
    392         name=name,
    393         values=val,
    394         dtype=dtype,
    395         strict=strict,
    396         nan_to_null=nan_to_null,
    397     )
    398 elif val is None or isinstance(  # type: ignore[redundant-expr]
    399     val, (int, float, str, bool, date, datetime, time, timedelta)
    400 ):
    401     updated_data[name] = F.repeat(
    402         val, array_len, dtype=dtype, eager=True
    403     ).alias(name)

File ~/miniconda3/envs/polars-env/lib/python3.12/site-packages/polars/series/series.py:359, in Series.__init__(self, name, values, dtype, strict, nan_to_null)
    354 else:
    355     msg = (
    356         f"Series constructor called with unsupported type {type(values).__name__!r}"
    357         " for the `values` parameter"
    358     )
--> 359     raise TypeError(msg)

TypeError: Series constructor called with unsupported type 'ndarray' for the `values` parameter

Issue description

In the Join sub-section of the Combining DataFrames section of the Getting Started guide (https://docs.pola.rs/user-guide/getting-started/#combining-dataframes), there is python API example for joining two dataframes. When executing the code as it's written, the TypeError shared is raised, with an issue constructing the Series using the ndarray type. This was tested with numpy version 2.1.1. The rest of the conda environment used is shared below.

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
anyio 4.6.0 pypi_0 pypi argon2-cffi 23.1.0 pypi_0 pypi argon2-cffi-bindings 21.2.0 pypi_0 pypi arrow 1.3.0 pypi_0 pypi asttokens 2.4.1 pypi_0 pypi async-lru 2.0.4 pypi_0 pypi attrs 24.2.0 pypi_0 pypi babel 2.16.0 pypi_0 pypi beautifulsoup4 4.12.3 pypi_0 pypi bleach 6.1.0 pypi_0 pypi bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.7.2 h06a4308_0
certifi 2024.8.30 pypi_0 pypi cffi 1.17.1 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi comm 0.2.2 pypi_0 pypi debugpy 1.8.6 pypi_0 pypi decorator 5.1.1 pypi_0 pypi defusedxml 0.7.1 pypi_0 pypi executing 2.1.0 pypi_0 pypi expat 2.6.3 h6a678d5_0
fastjsonschema 2.20.0 pypi_0 pypi fqdn 1.5.1 pypi_0 pypi h11 0.14.0 pypi_0 pypi httpcore 1.0.5 pypi_0 pypi httpx 0.27.2 pypi_0 pypi idna 3.10 pypi_0 pypi ipykernel 6.29.5 pypi_0 pypi ipython 8.27.0 pypi_0 pypi isoduration 20.11.0 pypi_0 pypi jedi 0.19.1 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi json5 0.9.25 pypi_0 pypi jsonpointer 3.0.0 pypi_0 pypi jsonschema 4.23.0 pypi_0 pypi jsonschema-specifications 2023.12.1 pypi_0 pypi jupyter-client 8.6.3 pypi_0 pypi jupyter-core 5.7.2 pypi_0 pypi jupyter-events 0.10.0 pypi_0 pypi jupyter-lsp 2.2.5 pypi_0 pypi jupyter-server 2.14.2 pypi_0 pypi jupyter-server-terminals 0.5.3 pypi_0 pypi jupyterlab 4.2.5 pypi_0 pypi jupyterlab-pygments 0.3.0 pypi_0 pypi jupyterlab-server 2.27.3 pypi_0 pypi ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
markupsafe 2.1.5 pypi_0 pypi matplotlib-inline 0.1.7 pypi_0 pypi mistune 3.0.2 pypi_0 pypi nbclient 0.10.0 pypi_0 pypi nbconvert 7.16.4 pypi_0 pypi nbformat 5.10.4 pypi_0 pypi ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi notebook-shim 0.2.4 pypi_0 pypi numpy 2.1.1 pypi_0 pypi openssl 3.0.15 h5eee18b_0
overrides 7.7.0 pypi_0 pypi packaging 24.1 pypi_0 pypi pandocfilters 1.5.1 pypi_0 pypi parso 0.8.4 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi pip 24.2 py312h06a4308_0
platformdirs 4.3.6 pypi_0 pypi polars 1.8.2 pypi_0 pypi prometheus-client 0.21.0 pypi_0 pypi prompt-toolkit 3.0.48 pypi_0 pypi psutil 6.0.0 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.3 pypi_0 pypi pycparser 2.22 pypi_0 pypi pygments 2.18.0 pypi_0 pypi python 3.12.5 h5148396_1
python-dateutil 2.9.0.post0 pypi_0 pypi python-json-logger 2.0.7 pypi_0 pypi pyyaml 6.0.2 pypi_0 pypi pyzmq 26.2.0 pypi_0 pypi readline 8.2 h5eee18b_0
referencing 0.35.1 pypi_0 pypi requests 2.32.3 pypi_0 pypi rfc3339-validator 0.1.4 pypi_0 pypi rfc3986-validator 0.1.1 pypi_0 pypi rpds-py 0.20.0 pypi_0 pypi send2trash 1.8.3 pypi_0 pypi setuptools 75.1.0 py312h06a4308_0
six 1.16.0 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi soupsieve 2.6 pypi_0 pypi sqlite 3.45.3 h5eee18b_0
stack-data 0.6.3 pypi_0 pypi terminado 0.18.1 pypi_0 pypi tinycss2 1.3.0 pypi_0 pypi tk 8.6.14 h39e8969_0
tornado 6.4.1 pypi_0 pypi traitlets 5.14.3 pypi_0 pypi types-python-dateutil 2.9.0.20240906 pypi_0 pypi tzdata 2024a h04d1e81_0
uri-template 1.3.0 pypi_0 pypi urllib3 2.2.3 pypi_0 pypi wcwidth 0.2.13 pypi_0 pypi webcolors 24.8.0 pypi_0 pypi webencodings 0.5.1 pypi_0 pypi websocket-client 1.8.0 pypi_0 pypi wheel 0.44.0 py312h06a4308_0
xz 5.4.6 h5eee18b_1
zlib 1.2.13 h5eee18b_1

Expected behavior

My expectation is that the example code snippet should run without any errors and create a dataframe.

Installed versions

``` --------Version info--------- Polars: 1.8.2 Index type: UInt32 Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Python: 3.12.5 | packaged by Anaconda, Inc. | (main, Sep 12 2024, 18:27:27) [GCC 11.2.0] ----Optional dependencies---- adbc_driver_manager altair cloudpickle connectorx deltalake fastexcel fsspec gevent great_tables matplotlib nest_asyncio 1.6.0 numpy 2.1.1 openpyxl pandas pyarrow pydantic pyiceberg sqlalchemy torch xlsx2csv xlsxwriter ```
cmdlineluser commented 2 hours ago

I just tested with numpy 2.1.1 and cannot replicate an error.

It seems to be complaining about creating a Series from .rand()?

>>> pl.__version__
'1.8.2'
>>> np.__version__
'2.1.1'
>>> pl.Series(np.random.rand(3))
shape: (3,)
Series: '' [f64]
[
    0.852444
    0.277395
    0.185109
]
conorhamill36 commented 1 hour ago

Restarted my kernel and it ran fine this time. Unsure what the issue was, but I doubt it'll be reproducible now now and likely a fault of how I created my environment, so I'll close the issue.

Thanks for having a look.