mechatroner / RBQL

🦜RBQL - Rainbow Query Language: SQL-like query engine for (not only) CSV file processing. Supports SQL queries with Python and JavaScript expressions.
https://rbql.org
MIT License
276 stars 13 forks source link

JOIN on NR == b.ID #45

Open agladysh opened 12 months ago

agladysh commented 12 months ago

I'm trying to JOIN two tables, where the B table has additional information for some rows in the A. I can't get the query to work as I expect.

a.csv:

alpha
a-one
a-two

b.csv:

id,beta
1,b-one
2,b-two

Query:

select a.alpha, b.beta left join b on aNR==b.id

Output:

alpha,beta
a-one,
a-two,

Expected output:

alpha,beta
a-one,b-one
a-two,b-two

What am I missing?

mechatroner commented 12 months ago

I am sorry, this is a pitfall RBQL should handle more gracefully, the problem is that RBQL JOIN is (currently) very rudimentary. It just compares strings in the left and right tables, so when you compare aNR with b.id it doesn't match any records because aNR has integer type while b.id is a string. I will need to add more checks to report an error for this kind of query explicitly.

As a workaround, you can first run Select aNR as nr_id, * with(headers) on the first table, and then use the output table to join a.nr_id == b.id

agladysh commented 12 months ago

No worries and thank you! But can you maybe just cast NRs to strings in the JOIN expression? Seems to be a localized change.

I've implemented the workaround you suggested, but my table is rather large, and ideally I would like to avoid extra processing step. Especially since I have an unrelated design limitation that leads me to have to re-parse the data back from a string on each step. :-)