Open dpinol opened 1 month ago
This is because numpy arrays are not nullable. When you put [None]
into a numpy array, it gets converted to an numpy array of python objects.
This is not really polars' fault.
@coastalwhite thanks for your quick answer!
It's a pity because having None in the first element is the only case which fails. Otherwise, it works, even with strict=True
.
pl.Series("a", np.array(["3",None, "3"]),pl.String, strict=True)
Out[64]:
shape: (3,)
Series: 'a' [str]
[
"3"
null
"3"
]
Do you see any workaround? Using NAN does not work either.
pl.Series("a", np.array(["3",np.nan, "3"], np.object_),pl.String, nan_to_null=True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[65], line 1
----> 1 pl.Series("a", np.array(["3",np.nan, "3"], np.object_),pl.String, nan_to_null=True)
File ~/.local/share/virtualenvs/sa2-CjZGZvYy/lib/python3.12/site-packages/polars/series/series.py:300, in __init__(self, name, values, dtype, strict, nan_to_null)
297 dtype = pl_dtype
299 # Handle case where values are passed as the first argument
--> 300 original_name: str | None = None
301 if name is None:
302 name = ""
File ~/.local/share/virtualenvs/sa2-CjZGZvYy/lib/python3.12/site-packages/polars/_utils/construction/series.py:455, in numpy_to_pyseries(name, values, strict, nan_to_null)
453 elif not hasattr(array, "num_chunks"):
454 pys = PySeries.from_arrow(name, array)
--> 455 else:
456 if array.num_chunks > 1:
457 # somehow going through ffi with a structarray
458 # returns the first chunk every time
459 if isinstance(array.type, pa.StructType):
TypeError: 'float' object cannot be converted to 'PyString'
Checks
Reproducible example
Log output
Issue description
When the first value is null, numpy columns cannot be imported, even when the schema is specified. It's not exactly the same as https://github.com/pola-rs/polars/issues/17484, because in that one the first value is a nan. Currently the workaround consists on converting the data to python list, but apart from being inefficient, as specified, it disables
nan_to_null=True
pl.Series("a", np.array([None, "3"]).tolist(),pl.String64)
Expected behavior
It should create a pl.String column with a null and a string values.
Installed versions