trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

information_schema should not fail for duplicate columns #780

Open chadheyne opened 5 years ago

chadheyne commented 5 years ago

I believe this is similar to https://github.com/prestodb/presto/pull/11847 and https://github.com/prestosql/presto/pull/568.

When I am running

SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_schema NOT IN ('pg_catalog', 'information_schema')

The whole query fails with

Query 20190515_163812_00120_bynzu failed: Hive metadata for table myTable is invalid: Table descriptor contains duplicate columns
io.prestosql.spi.PrestoException: Hive metadata for table myTable is invalid: Table descriptor contains duplicate columns
    at io.prestosql.plugin.hive.HiveMetadata.columnMetadataGetter(HiveMetadata.java:2148)
    at io.prestosql.plugin.hive.HiveMetadata.doGetTableMetadata(HiveMetadata.java:442)
    at io.prestosql.plugin.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:423)
    at io.prestosql.plugin.hive.HiveMetadata.listTableColumns(HiveMetadata.java:558)
    at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:246)
    at io.prestosql.metadata.MetadataManager.listTableColumns(MetadataManager.java:572)
    at io.prestosql.metadata.MetadataListing.listTableColumns(MetadataListing.java:93)
    at io.prestosql.connector.informationSchema.InformationSchemaPageSourceProvider.buildColumns(InformationSchemaPageSourceProvider.java:146)
    at io.prestosql.connector.informationSchema.InformationSchemaPageSourceProvider.getInformationSchemaTable(InformationSchemaPageSourceProvider.java:115)
    at io.prestosql.connector.informationSchema.InformationSchemaPageSourceProvider.getInternalTable(InformationSchemaPageSourceProvider.java:109)
    at io.prestosql.connector.informationSchema.InformationSchemaPageSourceProvider.createPageSource(InformationSchemaPageSourceProvider.java:81)
    at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
    at io.prestosql.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:221)
    at io.prestosql.operator.Driver.processInternal(Driver.java:379)
    at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
    at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
    at io.prestosql.operator.Driver.processFor(Driver.java:276)
    at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
    at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
    at io.prestosql.$gen.Presto_306____20190503_191743_1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

This happens when the partition key is named the same as a column in the table.

This is the first ticket I've opened here so please let me know if I missed anything or if y'all need any more details.

electrum commented 5 years ago

From a SQL perspective, these operations should have different behavior for an invalid table:

The problem is that the latter is implemented as a rewrite against the former. This happens in ShowQueriesRewrite.Visitor#visitShowColumns. The columns table in turn calls ConnectorMetadata#listTableColumns.

This is how to fix it:

One question is if we should also call AccessControl#filterTables. This happens today, but it seems to be more of a side effect of the implementation. The documentation for AccessControl#checkCanShowColumnsMetadata only mentions AccessControl#filterColumns.