Open deanm0000 opened 4 months ago
Is the Array
type intended for these cases? (although it requires same shape)
df = pl.DataFrame({
'a':[[1,2,3],[2,3,4],[5,7, None]],
'b':[[2,3,4],[7,8,9],[1,2, None]]
}).cast(pl.Array(pl.Int64, 3))
df.with_columns(c = pl.col.a * pl.col.b)
shape: (3, 3)
┌───────────────┬───────────────┬───────────────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ array[i64, 3] ┆ array[i64, 3] ┆ array[i64, 3] │
╞═══════════════╪═══════════════╪═══════════════╡
│ [1, 2, 3] ┆ [2, 3, 4] ┆ [2, 6, 12] │
│ [2, 3, 4] ┆ [7, 8, 9] ┆ [14, 24, 36] │
│ [5, 7, null] ┆ [1, 2, null] ┆ [5, 14, null] │
└───────────────┴───────────────┴───────────────┘
@cmdlineluser Sure as long as the lists are the same size then that's better. I deliberately made the example different sizes to illustrate that there's still a need.
On the syntax, I'm not sure if it should a new method, or if it'd be part of cast or implode. Maybe like
df=pl.DataFrame({'a':[1,2,3,4]})
# to list
df.select(pl.col('a').cast(pl.List(pl.Int64, [0,2,4]))
# to array
df.select(pl.col('a').cast(pl.List(pl.Int64, 2))
or
df=pl.DataFrame({'a':[1,2,3,4]})
# to list
df.select(pl.col('a').implode([0,2,4]))
# to array
df.select(pl.col('a').implode(2))
so if implode gets a list then it becomes a list, if it gets a scaler then it becomes an array.
In typing it out, I'm leaning more towards the latter.
Adding two Series with lists is implemented in #17823. Is that PR sufficient to close this issue too?
@itamarst no, the point isn't addition, that was just an example.
Description
This was inspired by this SO question
Here's a simpler example, imagine we have two list columns and we want to add them
One approach is
but the
over
is really unnecessary. We can use pyarrow to do this insteadPerformance
So this is half the time.