pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.14k stars 1.94k forks source link

Does cross-join guarantee the ordering of the resulting DataFrame? #17477

Open liufeimath opened 3 months ago

liufeimath commented 3 months ago

Description

Consider the following example:

import polars as pl

df1 = pl.DataFrame({"a": [1, 2]})
df2 = pl.DataFrame({"b": [3, 4]})
df3 = df1.join(df2, how="cross")
print(df3)

This gives the intended output with correct ordering (as of 1.0.0 in my test):

shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 1   ┆ 4   │
│ 2   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

The question is, is such ordering guaranteed by the cross-join? The document didn't clearly say so. The term "Cartesian product" refers to a set, which doesn't specify anything about ordering.

Link

https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

ritchie46 commented 3 months ago

Given that we are currently developing 2 new engines, I don't think we can guarantee that. For now I'd consider it an implementation detail.