opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

Join side aliases should be optional #862

Closed LantaoJin closed 2 weeks ago

LantaoJin commented 3 weeks ago

Description

Imaging a case as following: Assume table1, table2, and table3 all contain a column id.

select
  *
from
  table1 t1,
  table2 t2,
  table3 t3
where
 t1.id = t2.id
 and t1.id = t3.id

To rewrite above SQL query to PPL query, we will get

source = table1
| join left = t1 right = t2 ON t1.id = t2.id table2
| join left = l1 right = t3 ON t1.id = t3.id table3 // <------ issue here! 

The PPL query throws an exception with message:

t1.id cannot be resolved, Did you mean one of the following? [l1.id, l1.id, t3.id].

It because the required left side alias l1 overrides the table alias t1 and t2.

Its logical plan looks

'Project [*]
+- 'Filter (('t1.id = 't2.id) AND ('t1.id = 't3.id)) <------ t1.id cannot be resolved
   +- 'Join Inner
      :- 'SubqueryAlias l1  <------ issue root cause
      :  +- 'Join Inner
      :    :- 'SubqueryAlias t1
      :    :  +- 'UnresolvedRelation [table1], [], false
      :    +- 'SubqueryAlias t2
      :       +- 'UnresolvedRelation [table2], [], false
      +- 'SubqueryAlias t3
         +- 'UnresolvedRelation [table3], [], false

Check #857 for details

Related Issues

Resolves #857

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

LantaoJin commented 3 weeks ago

Without this fixing, q21 in TPCH couldn't be rewritten to PPL. Link https://github.com/opensearch-project/opensearch-spark/pull/830

YANG-DB commented 2 weeks ago

@LantaoJin also plz resolve conflicts...

LantaoJin commented 2 weeks ago

Conflicts is related to imports. CI tests succeed with the related files. Merging to main.