linkedin / coral

Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
BSD 2-Clause "Simplified" License
780 stars 182 forks source link

Column '<table>.<column>' is ambiguous in Trino #154

Open willie-valdez opened 3 years ago

willie-valdez commented 3 years ago

I am having a problem with hive view translation to Trino that results in a query with an ambiguous column.

I narrowed down the issue to a simple view that reproduces the issue. I also identified why this happens which is a combination of how the select is rewritten and the column aliases used.

Conditions:

Given these 2 tables:

create table hive.test.table_a (
  some_id string
);

create table hive.test.table_b (
  some_id string
);

These 2 views will have the same issue:

-- Hive view version 1
create or replace view test.view_ab as
select a.some_id
from   test.table_a a
left join 
(
  select trim(some_id) AS SOME_ID
  from   test.table_b
) b
on    a.some_id = b.some_id
where a.some_id != ''

-- Hive view version 2
create or replace view test.view_ab2 as
select a.some_id
from   test.table_a a
left join 
(
  select some_id AS SOME_ID
  from   test.table_b
) b
on    a.some_id = trim(b.some_id)
where a.some_id != ''

The translated view hive in Trino will look like this:

CREATE VIEW hive.test.view_ab SECURITY DEFINER AS
SELECT some_id
FROM
  (
   SELECT
     table_a.some_id some_id
   , t.SOME_ID SOME_ID
   FROM
     (test.table_a
   LEFT JOIN (
      SELECT
        TRIM(some_id) SOME_ID
      , CAST(TRIM(some_id) AS VARCHAR(65536)) $f1
      FROM
        test.table_b
   )  t ON (table_a.some_id = t.$f1))
)  t0
WHERE (t0.some_id <> '')

As you can see there are 2 columns with the same alias in the first sub-query

     table_a.some_id some_id
   , t.SOME_ID SOME_ID

The problem goes away if the alias is specified in lower case in the hive view definition. Which would seem like an easy change unless you have 50k+ views owned by many different teams across the organization.

I managed to fix the issue in my local environment by setting the node text to lower case for identifiers in the com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder class but I am not sure if this is the best place to do this or if this is the best way to resolve the problem.  Would that be an acceptable change?

  protected SqlNode visitIdentifier(ASTNode node, ParseContext ctx) {
    return new SqlIdentifier(node.getText().toLowerCase(Locale.ROOT), ZERO);
  }
findepi commented 3 years ago

setting the node text to lower case for identifiers in the com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder class

This is not unreasonable, given that HiveQL seems to be case insensitive, right?

@wmoustafa @losipiuk @funcheetah wdyt?

JiajunBernoulli commented 3 years ago

Maybe we can useorg.apache.calcite.rel.type.RelDataTypeSystem.isCaseSensitive(), I encountered a similar problem and solved it by isCaseSensitive(). If we do, we will need a new RelDataTypeSystemImpl, it's not a small change.

losipiuk commented 2 years ago

@wmoustafa do you think we can go with change to com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder proposed by @will-i-e? Seems easy to go but maybe you some fundamental problems with that approach?

ljfgem commented 2 years ago

I think the change to com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder proposed by @will-i-e might affect the case preservation of Coral-Schema. @wmoustafa @funcheetah any thoughts?