trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

JDBC PassThrough query fails when requesting columns with the same name #24177

Closed BaudoinWR closed 1 day ago

BaudoinWR commented 2 days ago

When sending a pass through query to a JDBC connector, if the results contain columns with the same name, trino returns an error. This can be reproduced with a mysql source (also tested with postgres with similar results)

With a mysql catalog named prism:

select * from TABLE(prism.system.query(
      query => 'SELECT
        column_name, column_name 
      FROM information_schema.columns')
); 

returns:

SQL Error [65536]: Query failed (#20241119_104319_00007_hkjph): Multiple entries with same key: column_name=COLUMN_NAME:varchar(64):VARCHAR and column_name=COLUMN_NAME:varchar(64):VARCHAR

The same occurs if the underlying column is different, as long as the name is identical

 select * from TABLE(prism.system.query(
      query => 'SELECT
        column_name as a, data_type as a 
      FROM information_schema.columns')
);      

returns:

SQL Error [65536]: Query failed (#20241119_104442_00011_hkjph): Multiple entries with same key: a=a:varchar(16777215):LONGTEXT and a=a:varchar(64):VARCHAR

For reference, sending the request without using pass through works as expected

select column_name as a, data_type as a from prism.information_schema.columns

The underlying issue appears to be the creation of the ColumnHandle map in that code path.

2024-11-19 10:41:12 2024-11-19T10:41:12.980Z    DEBUG   Query-20241119_104110_00006_hkjph-297   io.trino.execution.QueryStateMachine    Query 20241119_104110_00006_hkjph failed
2024-11-19 10:41:12 java.lang.IllegalArgumentException: Multiple entries with same key: a=a:varchar(16777215):LONGTEXT and a=a:varchar(64):VARCHAR
2024-11-19 10:41:12     at com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:382)
2024-11-19 10:41:12     at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:376)
2024-11-19 10:41:12     at com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:249)
2024-11-19 10:41:12     at com.google.common.collect.RegularImmutableMap.fromEntryArrayCheckingBucketOverflow(RegularImmutableMap.java:136)
2024-11-19 10:41:12     at com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:98)
2024-11-19 10:41:12     at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:579)
2024-11-19 10:41:12     at com.google.common.collect.ImmutableMap$Builder.buildOrThrow(ImmutableMap.java:607)
2024-11-19 10:41:12     at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:731)
2024-11-19 10:41:12     at io.trino.plugin.jdbc.DefaultJdbcMetadata.getColumnHandles(DefaultJdbcMetadata.java:1072)
2024-11-19 10:41:12     at io.trino.plugin.jdbc.DefaultJdbcMetadata.getTableFunctionApplicationResult(DefaultJdbcMetadata.java:995)
2024-11-19 10:41:12     at io.trino.plugin.jdbc.DefaultJdbcMetadata.applyTableFunction(DefaultJdbcMetadata.java:985)
2024-11-19 10:41:12     at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.applyTableFunction(ClassLoaderSafeConnectorMetadata.java:1097)
2024-11-19 10:41:12     at io.trino.tracing.TracingConnectorMetadata.applyTableFunction(TracingConnectorMetadata.java:1252)
2024-11-19 10:41:12     at io.trino.metadata.MetadataManager.applyTableFunction(MetadataManager.java:2141)
2024-11-19 10:41:12     at io.trino.tracing.TracingMetadata.applyTableFunction(TracingMetadata.java:1044)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.rule.RewriteTableFunctionToTableScan.apply(RewriteTableFunctionToTableScan.java:60)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.rule.RewriteTableFunctionToTableScan.apply(RewriteTableFunctionToTableScan.java:38)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.IterativeOptimizer.transform(IterativeOptimizer.java:216)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.IterativeOptimizer.exploreNode(IterativeOptimizer.java:181)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:144)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:264)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:146)
2024-11-19 10:41:12     at io.trino.sql.planner.iterative.IterativeOptimizer.optimizeAndMarkPlanChanges(IterativeOptimizer.java:129)
2024-11-19 10:41:12     at io.trino.sql.planner.optimizations.AdaptivePlanOptimizer.optimize(AdaptivePlanOptimizer.java:33)
2024-11-19 10:41:12     at io.trino.sql.planner.LogicalPlanner.runOptimizer(LogicalPlanner.java:307)
2024-11-19 10:41:12     at io.trino.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:271)
2024-11-19 10:41:12     at io.trino.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:243)
2024-11-19 10:41:12     at io.trino.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:238)
2024-11-19 10:41:12     at io.trino.execution.SqlQueryExecution.doPlanQuery(SqlQueryExecution.java:503)
2024-11-19 10:41:12     at io.trino.execution.SqlQueryExecution.planQuery(SqlQueryExecution.java:482)
2024-11-19 10:41:12     at io.trino.execution.SqlQueryExecution.start(SqlQueryExecution.java:420)
2024-11-19 10:41:12     at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:272)
2024-11-19 10:41:12     at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:150)
2024-11-19 10:41:12     at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$2(LocalDispatchQuery.java:134)
2024-11-19 10:41:12     at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$12(MoreFutures.java:570)
2024-11-19 10:41:12     at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:545)
2024-11-19 10:41:12     at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
2024-11-19 10:41:12     at io.trino.$gen.Trino_464____20241119_102837_2.run(Unknown Source)
2024-11-19 10:41:12     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-11-19 10:41:12     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
2024-11-19 10:41:12     at java.base/java.lang.Thread.run(Thread.java:1575)
2024-11-19 10:41:12 
2024-11-19 10:41:12 
Praveen2112 commented 1 day ago

In case of normal query submitted to Trino like select column_name as a, data_type as a from prism.information_schema.columns - the optimizer tends to add alias towards the end so it doesn't fail - but the same query would fail when we try to create a table based on the same query.

Wrt to query passthrough also we translate the query to table handle and it requires all the columns name to be unique in-order to identify the which column name to load - so this is a known limitation

Praveen2112 commented 1 day ago

I'll be closing this ticket - Feel free to re-open if it doesn't answers your question and feel free to discuss the same in our slack channel as well.