Open mariasolomon opened 8 months ago
An attempt at a minimal repro.
Both frames appear to be equal:
a = pl.DataFrame([
{"A": {"id": "foo"}},
{"A": {}},
],
schema= {"A": pl.Object}
)
b = pl.DataFrame(
{"A": [{"id": "foo"}, {}]},
schema= {"A": pl.Object}
)
a.to_dicts() == b.to_dicts()
# True
The callback does appear to receive the "slice" correctly:
a.slice(0, 1).with_columns(pl.col("A").map_elements(lambda x: [print("[DEBUG]:", x), x][1]))
# [DEBUG]: {'id': 'foo'}
# shape: (1, 1)
# ┌───────────┐
# │ A │
# │ --- │
# │ struct[1] │
# ╞═══════════╡
# │ {"foo"} │
# └───────────┘
b.slice(0, 1).with_columns(pl.col("A").map_elements(lambda x: [print("[DEBUG]:", x), x][1]))
# [DEBUG]: {'id': 'foo'}
# shape: (1, 1)
# ┌───────────┐
# │ A │
# │ --- │
# │ struct[1] │
# ╞═══════════╡
# │ {"foo"} │
# └───────────┘
But if you interact with it in any meaningful way, the length mismatch happens:
a.slice(0, 1).with_columns(pl.col("A").map_elements(lambda x: x.get("id")))
# ShapeError: unable to add a column of length 2 to a DataFrame of height 1
b.slice(0, 1).with_columns(pl.col("A").map_elements(lambda x: x.get("id")))
# shape: (1, 1)
# ┌─────┐
# │ A │
# │ --- │
# │ str │
# ╞═════╡
# │ foo │
# └─────┘
Checks
Reproducible example
Log output
Issue description
It works fine on any other column types. It works fine if the data frame is composed from a dictionary with list values such as:
But it seems to apply the map operation on the initial column of length 4 and not on the batch column of length 2 if the data frame is composed from a list of dictionaries :
Expected behavior
The Map operation should work fine on a Object type column of a data frame composed from a list of dictionaries.
Installed versions