Open liufeimath opened 3 months ago
Consider the following example:
import polars as pl df1 = pl.DataFrame({"a": [1, 2]}) df2 = pl.DataFrame({"b": [3, 4]}) df3 = df1.join(df2, how="cross") print(df3)
This gives the intended output with correct ordering (as of 1.0.0 in my test):
1.0.0
shape: (4, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 3 │ │ 1 ┆ 4 │ │ 2 ┆ 3 │ │ 2 ┆ 4 │ └─────┴─────┘
The question is, is such ordering guaranteed by the cross-join? The document didn't clearly say so. The term "Cartesian product" refers to a set, which doesn't specify anything about ordering.
https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html
Given that we are currently developing 2 new engines, I don't think we can guarantee that. For now I'd consider it an implementation detail.
Description
Consider the following example:
This gives the intended output with correct ordering (as of
1.0.0
in my test):The question is, is such ordering guaranteed by the cross-join? The document didn't clearly say so. The term "Cartesian product" refers to a set, which doesn't specify anything about ordering.
Link
https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html