pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.22k stars 1.67k forks source link

select in join with same-named columns returns the wrong data #16255

Closed urkle closed 1 month ago

urkle commented 1 month ago

Checks

Reproducible example

I am using QSV (which uses 0.39.2 version of polars) and doing a query with two tables with some same-named column does not return correct results.

Reference ticket for QSV: https://github.com/jqnatividad/qsv/issues/1820

Log output

No response

Issue description

Given two csv files

id,data
1,open
id,data
1,closed

And running a SQL query like this

select a.id, a.data as d1, b.data as d2 from read_csv('one.csv') as a join read_csv('two.csv') as b ON a.id = b.id

Expected behavior

I expect the result to be

id,d1,d2
1,open,closed

But instead I get

id,d1,d2
1,open,open

(which ever table is first the value shows for both).

I have recreated this issue in QSV (using polars 0.39.2) and polar-cli 0.7.0

Installed versions

qsv w/ polars 0.39.2 and polars-cli 0.7.0

cmdlineluser commented 1 month ago

For reference: https://github.com/pola-rs/polars/issues/15929#issuecomment-2081622812

I'm just waiting on some internal join-code updates (which should be coming soon) to help simplify the fix on the SQL side👌

I think it's in a couple of issues, so they can probably be amalgamated

alexander-beedie commented 1 month ago

Looks like the same thing; and we now have the consistent coalesce modes I was hoping for, so I'll tackle this one shortly. Will close out in favour of the existing issue(s). Thanks for the report!👌