pola-rs / polars-cli

CLI interface for running SQL queries with Polars as backend
https://pola.rs/
MIT License
159 stars 12 forks source link

Cross join is treated as inner join #52

Closed l1t1 closed 9 months ago

l1t1 commented 10 months ago

Checks

Reproducible example

〉select count(*) from read_parquet('slow3.parquet');
┌────────┐
│ count  │
│ ---    │
│ u32    │
╞════════╡
│ 100000 │
└────────┘
〉select count(*) from read_parquet('slow3.parquet') t1,read_parquet('slow3.parquet') t2;
┌────────┐
│ count  │
│ ---    │
│ u32    │
╞════════╡
│ 100000 │
└────────┘
〉select count(*) from read_parquet('slow3.parquet') t1 cross join read_parquet('slow3.parquet') t2;
Error: cross joins would produce more rows than fits into 2^32; consider compiling with polars-big-idx feature, or set 'streaming'


### Issue description

the second sql should return 10000000000, but returns 10000
the third sql  should return 10000000000 too

### Expected behavior

the second sql and the third sql both return 10000000000

### Installed version

0.6.0
stinodego commented 9 months ago

Could you make a minimal reproducible example, e.g. without reading parquet files? I tried reproducing this on the latest Polars main branch from Python but am unable to do so:

import polars as pl

df1 = pl.DataFrame({"a": [1, 1], "b": [3, 4]})
df2 = pl.DataFrame({"a": [1, 2], "c": [5, 6]})

result = df1.join(df2, how="cross")
print(result)

sql = pl.SQLContext({"df1": df1, "df2": df2})
result = sql.execute("select * from df1 cross join df2;", eager=True)
print(result)  # same result
l1t1 commented 9 months ago

use your example, see the result of duckdb

>>> result = sql.execute("select * from df1, df2;", eager=True)
>>> print(result)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 1   ┆ 4   │
└─────┴─────┘
>>> import pandas as pd
>>> import duckdb as dd
>>> dd.sql("select * from df1, df2;")
┌───────┬───────┬───────┬───────┐
│   a   │   b   │   a   │   c   │
│ int64 │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼───────┤
│     1 │     3 │     1 │     5 │
│     1 │     3 │     2 │     6 │
│     1 │     4 │     1 │     5 │
│     1 │     4 │     2 │     6 │
└───────┴───────┴───────┴───────┘
l1t1 commented 9 months ago

see https://github.com/pola-rs/polars/issues/13618

stinodego commented 9 months ago

Right, closing as a duplicate then.

l1t1 commented 5 months ago

still returns wrong result in version 10. 20.31