Open adamreeve opened 3 months ago
concat_list doesn't do what you think it does! It constructs a new list column where the entries of the list are the input exprs. I would like to do the same, but for array. Eg.
df = pl.DataFrame(
{
'a': [1,2,3],
'b': [4,5,6],
}
)
df.select(
pl.concat_list(pl.col('a'), pl.col('b')),
pl.Series(df.select('a', 'b').to_numpy(), dtype=pl.Array(pl.Int64, 2)), # this should be just pl.concat_array(pl.col('a'), pl.col('b'))
)
concat_list
does do what I think it does and also what you think it does :wink: https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.concat_list.html
concat_array
should probably work similarly and allow creating an array from scalars or concatenating existing arrays, or using a mix of arrays and scalars.
Ah fair! I only knew about Expr.list.concat for the other thing. I stand corrected.
I only care about one of those two cases, but as you say it's probably best to have both if it's going to be named analogously. Thinking about it more, I think concat_[list|array] is a bad name for the "make a list|array" case and they should be separate apis. Out-of-scope though.
@m00ngoose There has been some discussion of that if it is of interest:
Based on the discussion linked above it looks like we most likely want to have separate methods for array construction (pl.array
) and array concatenation (pl.concat_arrays
), which seems much cleaner to me than one function that does both. Further discussion about the method split and naming should probably stay in that issue, but I think it makes sense to keep this issue open for implementing the array methods.
I have added a draft PR to discuss the design of a pl.array
function in order to firm up what this would look like and how it should behave. @adamreeve
Description
Polars allows concatentation of
List
typed columns withpl.concat_list
. It would be useful to also allow concatenation ofArray
typed columns.Eg:
This should produce a new column equivalent to: