Open MariusMerkleQC opened 1 month ago
Can you provide an example including the desired result? Do you mean when the left frame is null, or when the right frame does not have a matching row but does have a null value?
Does this example clarify the desired result?
import polars as pl
from datetime import datetime
df_expected = pl.DataFrame(
data=[
(None, datetime(2024, 1, 1, 0, 0, 0), 5),
("a", datetime(2024, 1, 1, 0, 0, 0), 5),
],
schema={"category": pl.Utf8, "timestamp": pl.Datetime, "value": pl.Int8},
orient="row",
)
df_left = df_expected.drop("value")
df_right = pl.DataFrame(
data=[
(None, datetime(2023, 1, 1, 0, 0, 0), 5),
("a", datetime(2023, 1, 1, 0, 0, 0), 5),
],
schema={"category": pl.Utf8, "timestamp": pl.Datetime, "value": pl.Int8},
orient="row",
)
df_actual = df_left.join_asof(
other=df_right,
on="timestamp",
by=["category"],
strategy="backward", # join_nulls=True
)
I see--I believe you're asking that the initial by=...
include the ability to join on nulls.
I think there is an issue which is that joining on nulls produces the cartesian product of the matching records, and these are not guaranteed to have a sorted output order, which is a requirement of join_asof
. But of course if the join itself is producing those records, it could probably sort them.
Description
Would it be possible to add the
join_nulls: bool = False
argument to the .join_asof() function, as it is also available to the .join() function?I have a use case where I want to join two data frames using the "asof" logic, and I'd also like to join when the join keys (
on
/left_on
/right_on
) areNull
. I would also be interested whether there is a workaround in the mean time.