opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[BUG] Side aliases in JOIN command should be optional #857

Closed LantaoJin closed 2 weeks ago

LantaoJin commented 3 weeks ago

What is the bug? Currently, the syntax of JOIN command requires side aliases to be present as following

leftAlias

rightAlias

But these required side aliases could cause a bug which hide the attribute references in table. Imaging a case as following: Assume table1, table2, and table3 all contain a column id.

select
  *
from
  table1 t1,
  table2 t2,
  table3 t3
where
 t1.id = t2.id
 and t1.id = t3.id

The logical plan looks

'Project [*]
+- 'Filter (('t1.id = 't2.id) AND ('t1.id = 't3.id))
   +- 'Join Inner
      :- 'Join Inner
      :  :- 'SubqueryAlias t1
      :  :  +- 'UnresolvedRelation [table1], [], false
      :  +- 'SubqueryAlias t2
      :     +- 'UnresolvedRelation [table2], [], false
      +- 'SubqueryAlias t3
         +- 'UnresolvedRelation [table3], [], false

To rewrite above SQL query to PPL query, we will get

source = table1
| join left = t1 right = t2 ON t1.id = t2.id table2
| join left = l1 right = t3 ON t1.id = t3.id table3 // <------ issue here! 

The PPL query throws an exception with message:

t1.id cannot be resolved, Did you mean one of the following? [l1.id, l1.id, t3.id].

It because the new left alias l1 overrides the table alias t1 and t2. Its logical plan looks

'Project [*]
+- 'Filter (('t1.id = 't2.id) AND ('t1.id = 't3.id)). <------ t1.id cannot be resolved
   +- 'Join Inner
      :- 'SubqueryAlias l1  <------ issue root cause
      :  +- 'Join Inner
      :    :- 'SubqueryAlias t1
      :    :  +- 'UnresolvedRelation [table1], [], false
      :    +- 'SubqueryAlias t2
      :       +- 'UnresolvedRelation [table2], [], false
      +- 'SubqueryAlias t3
         +- 'UnresolvedRelation [table3], [], false