trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.2k stars 2.94k forks source link

SQL Merge (on bucketed table) in Hive fails when writer count is increased from 1 #14636

Closed gaurav8297 closed 1 year ago

gaurav8297 commented 1 year ago

TestHiveMerge > testMergeMultipleOperationsBucketedUnpartitioned [groups: hive_transactional] test is failing when we are trying to increase the default value of task_writer_count to 32 for partitioned writes.

PR: https://github.com/trinodb/trino/pull/14553

CI: https://github.com/trinodb/trino/pull/14553/checks?check_run_id=8882217060

io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20221013_220846_00032_7mprf): Error creating ORC file
    at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
    at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
    at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:60)
    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
    at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
    at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
    at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
    at io.trino.tests.product.utils.QueryExecutors$1.executeQuery(QueryExecutors.java:60)
    at io.trino.tests.product.hive.TestHiveMerge.testMergeMultipleOperationsInternal(TestHiveMerge.java:213)
    at io.trino.tests.product.hive.TestHiveMerge.lambda$testMergeMultipleOperationsBucketedUnpartitioned$10(TestHiveMerge.java:194)
    at io.trino.tests.product.hive.TestHiveMerge.withTemporaryTable(TestHiveMerge.java:730)
    at io.trino.tests.product.hive.TestHiveMerge.testMergeMultipleOperationsBucketedUnpartitioned(TestHiveMerge.java:191)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
    at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
    at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLException: Query failed (#20221013_220846_00032_7mprf): Error creating ORC file
    at io.trino.jdbc.AbstractTrinoResultSet.resultsException(AbstractTrinoResultSet.java:1937)
    at io.trino.jdbc.TrinoResultSet$ResultsPageIterator.computeNext(TrinoResultSet.java:295)
    at io.trino.jdbc.TrinoResultSet$ResultsPageIterator.computeNext(TrinoResultSet.java:255)
    at io.trino.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
    at io.trino.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
    at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1855)
    at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292)
    at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
    at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
    at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298)
    at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
    at io.trino.jdbc.TrinoResultSet$AsyncIterator.lambda$new$1(TrinoResultSet.java:180)
    ... 5 more
    Suppressed: java.lang.Exception: Query: MERGE INTO test_merge_multiple_false_BUCKETED_V2_yzea9ubixi8d t USING (SELECT * FROM (VALUES ('joe_16', 3000, 83000, 'jill_16', '16 Eop Ct'), ('joe_17', 3000, 83000, 'jill_17', '17 Eop Ct'), ('joe_18', 3000, 83000, 'jill_18', '18 Eop Ct'), ('joe_19', 3000, 83000, 'jill_19', '19 Eop Ct'), ('joe_20', 3000, 83000, 'jill_20', '20 Eop Ct'), ('joe_21', 3000, 83000, 'jill_21', '21 Eop Ct'), ('joe_22', 3000, 83000, 'jill_22', '22 Eop Ct'), ('joe_23', 3000, 83000, 'jill_23', '23 Eop Ct'), ('joe_24', 3000, 83000, 'jill_24', '24 Eop Ct'), ('joe_25', 3000, 83000, 'jill_25', '25 Eop Ct'), ('joe_26', 3000, 83000, 'jill_26', '26 Eop Ct'), ('joe_27', 3000, 83000, 'jill_27', '27 Eop Ct'), ('joe_28', 3000, 83000, 'jill_28', '28 Eop Ct'), ('joe_29', 3000, 83000, 'jill_29', '29 Eop Ct'), ('joe_30', 3000, 83000, 'jill_30', '30 Eop Ct'), ('joe_31', 3000, 83000, 'jill_31', '31 Eop Ct'))) AS s(customer, purchases, zipcode, spouse, address)    ON t.customer = s.customer    WHEN MATCHED THEN UPDATE SET purchases = s.purchases, zipcode = s.zipcode, spouse = s.spouse, address = s.address
        at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
        at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
        at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
        at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:60)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
        at io.trino.tests.product.utils.QueryExecutors$1.executeQuery(QueryExecutors.java:60)
        at io.trino.tests.product.hive.TestHiveMerge.testMergeMultipleOperationsInternal(TestHiveMerge.java:213)
        at io.trino.tests.product.hive.TestHiveMerge.lambda$testMergeMultipleOperationsBucketedUnpartitioned$10(TestHiveMerge.java:194)
        at io.trino.tests.product.hive.TestHiveMerge.withTemporaryTable(TestHiveMerge.java:730)
        at io.trino.tests.product.hive.TestHiveMerge.testMergeMultipleOperationsBucketedUnpartitioned(TestHiveMerge.java:191)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
        at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
        at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
        ... 5 more
Caused by: io.trino.spi.TrinoException: Error creating ORC file
    at io.trino.plugin.hive.orc.OrcFileWriterFactory.createFileWriter(OrcFileWriterFactory.java:225)
    at io.trino.plugin.hive.AbstractHiveAcidWriters.getOrCreateDeleteFileWriter(AbstractHiveAcidWriters.java:187)
    at io.trino.plugin.hive.MergeFileWriter.lambda$appendRows$0(MergeFileWriter.java:91)
    at java.base/java.util.Optional.ifPresent(Optional.java:178)
    at io.trino.plugin.hive.MergeFileWriter.appendRows(MergeFileWriter.java:88)
    at io.trino.plugin.hive.HiveWriter.append(HiveWriter.java:84)
    at io.trino.plugin.hive.HivePageSink.writePage(HivePageSink.java:345)
    at io.trino.plugin.hive.HivePageSink.doAppend(HivePageSink.java:297)
    at io.trino.plugin.hive.HivePageSink.lambda$appendPage$2(HivePageSink.java:283)
    at io.trino.hdfs.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:26)
    at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
    at io.trino.hdfs.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:25)
    at io.trino.hdfs.HdfsEnvironment.doAs(HdfsEnvironment.java:98)
    at io.trino.plugin.hive.HivePageSink.appendPage(HivePageSink.java:283)
    at io.trino.plugin.hive.HivePageSink.storeMergedRows(HivePageSink.java:455)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMergeSink.storeMergedRows(ClassLoaderSafeConnectorMergeSink.java:45)
    at io.trino.operator.MergeWriterOperator.addInput(MergeWriterOperator.java:96)
    at io.trino.operator.Driver.processInternal(Driver.java:416)
    at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
    at io.trino.operator.Driver.tryWithLock(Driver.java:706)
    at io.trino.operator.Driver.process(Driver.java:306)
    at io.trino.operator.Driver.processForDuration(Driver.java:277)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
    at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
    at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
    at io.trino.$gen.Trino_399_101_g5cb1261____20221013_220653_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.hadoop.ipc.RemoteException: Failed to CREATE_FILE /user/hive/warehouse/test_merge_multiple_false_bucketed_v2_yzea9ubixi8d/delete_delta_0000002_0000002_0000/bucket_00003 for DFSClient_NONMAPREDUCE_-975112431_159 on 172.18.0.3 because DFSClient_NONMAPREDUCE_-975112431_159 is already the current lease holder.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2555)
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:378)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2453)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2351)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:774)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:462)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
    at org.apache.hadoop.ipc.Client.call(Client.java:1457)
    at org.apache.hadoop.ipc.Client.call(Client.java:1367)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at jdk.proxy7/jdk.proxy7.$Proxy441.create(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:365)
    at jdk.internal.reflect.GeneratedMethodAccessor955.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
    at jdk.proxy7/jdk.proxy7.$Proxy442.create(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:276)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1216)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1195)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1133)
    at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:536)
    at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:547)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:474)
    at io.trino.hdfs.TrinoFileSystemCache$FileSystemWrapper.create(TrinoFileSystemCache.java:370)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
    at io.trino.plugin.hive.orc.OrcFileWriterFactory.createOrcDataSink(OrcFileWriterFactory.java:243)
    at io.trino.plugin.hive.orc.OrcFileWriterFactory.createFileWriter(OrcFileWriterFactory.java:164)
    ... 28 more

cc @electrum @djsstarburst

gaurav8297 commented 1 year ago

TestHiveMerge > testMergeWithDifferentPartitioning(0: target_bucketed_by_customer_source_flat, 1: CREATE TABLE %s (customer STRING, purchases INT, address STRING) CLUSTERED BY (customer) INTO 3 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true'), 2: CREATE TABLE %s (customer STRING, purchases INT, address STRING) STORED AS ORC TBLPROPERTIES ('transactional'='true')) [groups: hive_transactional]

This test is also failing with the same error

findepi commented 1 year ago

let's merge this with https://github.com/trinodb/trino/issues/14516 i don't think there is a reason to treat them separate for now

(bottom line, we don't want to keep this one and https://github.com/trinodb/trino/issues/14637 open...)

sopel39 commented 1 year ago

Seems like a product bug. Might be related to https://github.com/trinodb/trino/issues/14637/ (same root cause), but error is different, so I'm keeping it as separate issue for now

findepi commented 1 year ago

i missed the "when we are trying to increase the default value of task_writer_count to 32 for partitioned writes" in the description (the issue looked like flaky test report to me). Thanks for emphasizing this in the PR title.

BTW i think we should have writer count > 1 in Trino testing server, to expose such bugs faster (it's not the first such).

djsstarburst commented 1 year ago

BTW i think we should have writer count > 1 in Trino testing server, to expose such bugs faster (it's not the first such).

💯

findepi commented 1 year ago

BTW i think we should have writer count > 1 in Trino testing server, to expose such bugs faster (it's not the first such).

https://github.com/trinodb/trino/pull/14660

gaurav8297 commented 1 year ago

https://github.com/trinodb/trino/issues/14516#issuecomment-1280960893 Fix for this test

djsstarburst commented 1 year ago

The SQL MERGE fixes PR has been merged, and that series contained the HiveMetadata.getUpdateLayout() support. So the two Hive tests whose failure I could reproduce, testMergeMultipleOperationsBucketedUnpartitioned and testMergeWithDifferentPartitioning, should now be fixed in tip master.