pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
28.67k stars 1.79k forks source link

Should `pl.from_numpy` return a `Series` for 1D array? #17454

Open liufeimath opened 1 month ago

liufeimath commented 1 month ago

Description

Currently it returns a n x 1 DataFrame. This causes inconsistencies. E.g.

import numpy as np
import polars as pd
x = np.random.rand(10)
y = pl.from_numpy(x).to_numpy()
# `x` is of shape (10,) while `y` is of shape (10, 1), not good
MarcoGorelli commented 1 month ago

thanks for the report

I think I agree - pl.from_pandas is overloaded to return DataFrame | Series depending on the input type

coastalwhite commented 1 month ago

We prefer a Series over a Dataframe, but we are not 100% sure how yet.

There are a few options.

Dataframe Constructor Series Constructor
pl.Dataframe.from_numpy pl.Series.from_numpy
pl.from_numpy(arr, schema=...) pl.from_numpy
pl.Dataframe(arr) pl.from_numpy