Open theelderbeever opened 6 months ago
Can reproduce.
pl.DataFrame({"A": [[[1.0], [2]]]})
# shape: (1, 1)
# ┌─────────────────┐
# │ A │
# │ --- │
# │ list[list[f64]] │
# ╞═════════════════╡
# │ [[1.0], [2.0]] │
# └─────────────────┘
pl.DataFrame({"A": [[{"B":1.0}, {"B":2}]]})
TypeError: unexpected value while building Series of type Float64; found value of type Int64: 2
INFER_SCHEMA_LENGTH
is hardcoded to 25 here, but it doesn't seem to come into play:
The issue seems to be that structs are treated differently to other types.
e.g. inside to_list
there is an explicit cast:
But to_struct
ends up calling from_any_values_and_dtype
again on the inner values:
So in this case, we end up with a strict call on the inner values that fails.
Series::from_any_values_and_dtype("name", [1.0, 2], Float64, true)
Checks
Reproducible example
Log output
Issue description
Polars fails to correctly infer the datatype of a nested struct even with
infer_schema_length=None
. The column in the example that is failing is theaggregated_value
field in theList(Struct( ... ))
.Expected behavior
infer_schema_length
should apply to nested types as well.Installed versions