trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.36k stars 2.98k forks source link

Wrong statement to retreive metadata from HMS ? (LIKE instead of =) #7424

Closed fcollin closed 3 years ago

fcollin commented 3 years ago

Hello,

When I try to connect to Presto from Tableau Desktop using a JDBC connection, Presto JDBC is running the following statement.

SELECT TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE,
  TYPE_NAME, COLUMN_SIZE, BUFFER_LENGTH, DECIMAL_DIGITS, NUM_PREC_RADIX,
  NULLABLE, REMARKS, COLUMN_DEF, SQL_DATA_TYPE, SQL_DATETIME_SUB,
  CHAR_OCTET_LENGTH, ORDINAL_POSITION, IS_NULLABLE,
  SCOPE_CATALOG, SCOPE_SCHEMA, SCOPE_TABLE,
  SOURCE_DATA_TYPE, IS_AUTOINCREMENT, IS_GENERATEDCOLUMN
FROM system.jdbc.columns
WHERE TABLE_CAT = 'MYCAT' AND TABLE_SCHEM LIKE 'MYDB' ESCAPE '\' AND TABLE_NAME LIKE 'MYTABLE' ESCAPE '\' AND COLUMN_NAME LIKE '%' ESCAPE '\'
ORDER BY TABLE_CAT, TABLE_SCHEM, TABLE_NAME, ORDINAL_POSITION

It is not efficient and raise the following exception in 30-40s

io.prestosql.spi.PrestoException: org.apache.hadoop.security.AccessControlException: Permission denied: user=XXXXXX access=EXECUTE, inode="/mypath/mypath2":hdfs:hdfs:d---------
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:292)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238)
    at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:428)
    at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:365)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
    at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4142)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:866)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

It seems to work in less than 200 ms when I rewrite the query swapping like by =

SELECT TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE,
  TYPE_NAME, COLUMN_SIZE, BUFFER_LENGTH, DECIMAL_DIGITS, NUM_PREC_RADIX,
  NULLABLE, REMARKS, COLUMN_DEF, SQL_DATA_TYPE, SQL_DATETIME_SUB,
  CHAR_OCTET_LENGTH, ORDINAL_POSITION, IS_NULLABLE,
  SCOPE_CATALOG, SCOPE_SCHEMA, SCOPE_TABLE,
  SOURCE_DATA_TYPE, IS_AUTOINCREMENT, IS_GENERATEDCOLUMN
FROM system.jdbc.columns
WHERE TABLE_CAT = 'MYCAT' AND TABLE_SCHEM = 'MYDB' AND TABLE_NAME = 'MYTABLE' AND COLUMN_NAME LIKE '%' ESCAPE '\'
ORDER BY TABLE_CAT, TABLE_SCHEM, TABLE_NAME, ORDINAL_POSITION

If someone can help ....

Tks and regards

Frederic

findepi commented 3 years ago

Does MYCAT, MYDB or MYTABLE contain underscores _?

fcollin commented 3 years ago

yes they contain underscores !

findepi commented 3 years ago

Can you share actual query that's getting issued to Trino? Am i right they are not properly escaped?

ie. youre seeing

AND TABLE_NAME LIKE 'some_table' ESCAPE '\'

and not

AND TABLE_NAME LIKE 'some\_table' ESCAPE '\'

?

fcollin commented 3 years ago

here is the request issued to Trino

SELECT TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE, TYPE_NAME, COLUMN_SIZE, BUFFER_LENGTH, DECIMAL_DIGITS, NUM_PREC_RADIX, NULLABLE, REMARKS, COLUMN_DEF, SQL_DATA_TYPE, SQL_DATETIME_SUB, CHAR_OCTET_LENGTH, ORDINAL_POSITION, IS_NULLABLE, SCOPE_CATALOG, SCOPE_SCHEMA, SCOPE_TABLE, SOURCE_DATA_TYPE, IS_AUTOINCREMENT, IS_GENERATEDCOLUMN FROM system.jdbc.columns WHERE TABLE_CAT = 'MY_HIVE_METASTORE' AND TABLE_SCHEM LIKE 'MY_HIVE_DB' ESCAPE '\ ' AND TABLE_NAME LIKE 'MY_HIVE_TABLE' ESCAPE '\ ' AND COLUMN_NAME LIKE '%' ESCAPE '\ ' ORDER BY TABLE_CAT, TABLE_SCHEM, TABLE_NAME, ORDINAL_POSITION

findepi commented 3 years ago

The reason for this is incorrect use of JDBC API by the application. In particular, https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getColumns-java.lang.String-java.lang.String-java.lang.String-java.lang.String- has the signature

ResultSet getColumns(String catalog,
                     String schemaPattern,
                     String tableNamePattern,
                     String columnNamePattern)
              throws SQLException

the schema or table name should not be passed as is. instead, the underscores should be escaped using metadata.getSearchStringEscape().

Sadly, it seems Tableau Desktop doesn't do that, resulting in queries with disappointing performance. Could you please report the problem to Tableau?

findepi commented 3 years ago

As a workaround, we could perhaps introduce a JDBC configuration option for compatibility with such incorrectly behaving application. Once set, it would treat tableNamePattern as literal table name (not a pattern) and same for schemaPattern and columnNamePattern.

@electrum @kokosing @losipiuk WDYT?

fcollin commented 3 years ago

I will raise the issue to Tableau tomorrow. It will be wonderfull if I can get a workaround quickly.

Tks a lot for your help.

losipiuk commented 3 years ago

As a workaround, we could perhaps introduce a JDBC configuration option for compatibility with such incorrectly behaving application. Once set, it would treat tableNamePattern as literal table name (not a pattern) and same for schemaPattern and columnNamePattern.

@electrum @kokosing @losipiuk WDYT?

I think that is a reasonable idea. The downside is that that it takes down the pressure from reporting the root cause issue to misbehaving client application developers. But still given fact that we saw the problem many times already, I think it makes sense to add such a config option.

findepi commented 3 years ago

See https://github.com/trinodb/trino/pull/7438