Open henryharbeck opened 1 week ago
Consider the below examples
# Example 1 urls = pl.DataFrame({"url": "abcd.com/page"}) categories = pl.DataFrame({"base_url": "abcd.com", "category": "landing page"}) urls.join_where(categories, pl.col("url").str.starts_with(pl.col("base_url"))) # InvalidOperationError: only 1 binary comparison allowed as join condition # Must resort to cross join then filter instead - produces expected result urls.join(categories, how="cross").filter(pl.col("url").str.starts_with(pl.col("base_url"))) # Example 2 a = pl.DataFrame({"change": [1, -5]}) b = pl.DataFrame({"sets": [[0, 1], [2, 3]], "category": ["bad", "good"]}) a.join_where(b, pl.col("change").is_in(pl.col("sets"))) # InvalidOperationError: only 1 binary comparison allowed as join condition # Must resort to cross join then filter instead - produces expected result a.join(b, how="cross").filter(pl.col("change").is_in(pl.col("sets")))
Requested based on this SO question
Yes, we will. We first need to support a nested loop join, so that you don't require an cartesian product in memory.
Description
Consider the below examples
Requested based on this SO question