pola-rs / nodejs-polars

nodejs front-end of polars
https://pola-rs.github.io/nodejs-polars/
MIT License
438 stars 44 forks source link

Infer schema with empty array causes index out of bounds exception #296

Open tmckenn2 opened 2 hours ago

tmckenn2 commented 2 hours ago

Have you tried latest version of polars?

yes

What version of polars are you using?

0.16.0

What operating system are you using polars on?

OS X 15.1

What node version are you using

Deno 2.0.6

Describe your bug.

An array index out of bounds exception is thrown when building a DataFrame from records, if an array column contains an empty array. Polars should be able to infer the schema as long as one array in the column is nonempty.

What are the steps to reproduce the behavior?

import pl from "nodejs-polars"

pl.DataFrame([
  {  a: [], b: 0 }, 
  {  a: [""], b: 0 }
])

What is the actual behavior?

error: Uncaught (in promise) Error: index out of bounds: the len is 0 but the index is 0
    at arrayToJsDataFrame ([REDACTED]/nodejs-polars/0.16.0/bin/internals/construction.js:197:46)
    at Module.DataFrameConstructor ([REDACTED]/nodejs-polars/0.16.0/bin/dataframe.js:730:78)
    at [REDACTED]/polars.ts:3:19

What is the expected behavior?

produce a schema:

{
  a: pl.List(pl.String),
  b: pl.Float64,
}
tmckenn2 commented 2 hours ago

I tracked the issue down to Dataframe.rs obj_to_pairs. Specifically, the List branch will produce an empty array of nested dtypes and pass that to coerce_data_type. coerce_data_type uses "all" match which returns true on an empty array and results in Dataframe.rs:1613 being executed.

I created a branch (but can't push it) with a test and a fix. In obj_to_pairs, I check if the incoming list is empty and return pl.List(pl.Null). If a later array has non null nested values, it will replace the null type. I am not sure if there is a better way to express the type of an empty list though.