pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.61k stars 1.89k forks source link

Allow join on different types if upcast is safe #15338

Open CaselIT opened 6 months ago

CaselIT commented 6 months ago

Description

It would be nice to allow joining between different data types when upcast is safe, for example i{8,16,32}->i64, u{8,16,32}->u64 etc

Example:

dfI32 = pl.DataFrame({'a': [1,2,3], 'b': list('abc')}).cast({'a': pl.Int32})
dfI64 = pl.DataFrame({'a': [1,2,3], 'c': list('def')})
dfI64.join(dfI32, on='a') # a would keep Int64 type
dfI32.join(dfI64, on='a') # a would become Int64 type (or this could be depending on join type)

Currently both of these error with an exception like

ComputeError: datatypes of join keys don't match - `a`: i64 on left does not match `a`: i32 on right
CaselIT commented 6 months ago

This could also be a kwargs that defaults to false, so that polars by itself does no type conversion, but users can opt into safe upcasts if the want to