trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.46k stars 3.01k forks source link

Flaky Hive product tests: Error committing write to Hive #4936

Closed ebyhr closed 1 year ago

ebyhr commented 4 years ago
2020-08-23T02:43:55.0287769Z presto-master       | 2020-08-23T08:28:55.018+0545 ERROR   remote-task-callback-7  io.prestosql.execution.StageStateMachine    Stage 20200823_024354_00034_zq34q.1 failed
2020-08-23T02:43:55.0288266Z presto-master       | io.prestosql.spi.PrestoException: Error committing write to Hive
2020-08-23T02:43:55.0288614Z presto-master       |  at io.prestosql.plugin.hive.orc.OrcFileWriter.commit(OrcFileWriter.java:155)
2020-08-23T02:43:55.0288938Z presto-master       |  at io.prestosql.plugin.hive.HiveWriter.commit(HiveWriter.java:86)
2020-08-23T02:43:55.0289270Z presto-master       |  at io.prestosql.plugin.hive.HivePageSink.doFinish(HivePageSink.java:190)
2020-08-23T02:43:55.0289665Z presto-master       |  at io.prestosql.plugin.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
2020-08-23T02:43:55.0290001Z presto-master       |  at java.base/java.security.AccessController.doPrivileged(Native Method)
2020-08-23T02:43:55.0290302Z presto-master       |  at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
2020-08-23T02:43:55.0290652Z presto-master       |  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
2020-08-23T02:43:55.0291036Z presto-master       |  at io.prestosql.plugin.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
2020-08-23T02:43:55.0291421Z presto-master       |  at io.prestosql.plugin.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
2020-08-23T02:43:55.0291760Z presto-master       |  at io.prestosql.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:89)
2020-08-23T02:43:55.0292082Z presto-master       |  at io.prestosql.plugin.hive.HivePageSink.finish(HivePageSink.java:181)
2020-08-23T02:43:55.0292453Z presto-master       |  at io.prestosql.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:77)
2020-08-23T02:43:55.0292799Z presto-master       |  at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:203)
2020-08-23T02:43:55.0293124Z presto-master       |  at io.prestosql.operator.Driver.processInternal(Driver.java:397)
2020-08-23T02:43:55.0293482Z presto-master       |  at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
2020-08-23T02:43:55.0293801Z presto-master       |  at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
2020-08-23T02:43:55.0294105Z presto-master       |  at io.prestosql.operator.Driver.processFor(Driver.java:276)
2020-08-23T02:43:55.0294439Z presto-master       |  at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
2020-08-23T02:43:55.0294791Z presto-master       |  at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
2020-08-23T02:43:55.0295127Z presto-master       |  at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
2020-08-23T02:43:55.0295460Z presto-master       |  at io.prestosql.$gen.Presto_1cc5db2____20200823_024202_2.run(Unknown Source)
2020-08-23T02:43:55.0295815Z presto-master       |  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-08-23T02:43:55.0296262Z presto-master       |  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-08-23T02:43:55.0296531Z presto-master       |  at java.base/java.lang.Thread.run(Thread.java:834)
2020-08-23T02:43:55.0297045Z presto-master       | Caused by: org.apache.hadoop.ipc.RemoteException: File /tmp/presto-hive/569fc3af-2a52-4a1b-85d4-38e844339646/20200823_024354_00034_zq34q_82a58b59-5216-45f7-84ed-c6af8b18f34a could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and no node(s) are excluded in this operation.
2020-08-23T02:43:55.0297401Z presto-master       |  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
2020-08-23T02:43:55.0297735Z presto-master       |  at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
2020-08-23T02:43:55.0298184Z presto-master       |  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
2020-08-23T02:43:55.0298515Z presto-master       |  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
2020-08-23T02:43:55.0298877Z presto-master       |  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
2020-08-23T02:43:55.0299250Z presto-master       |  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
2020-08-23T02:43:55.0299576Z presto-master       |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
2020-08-23T02:43:55.0299850Z presto-master       |  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
2020-08-23T02:43:55.0300130Z presto-master       |  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
2020-08-23T02:43:55.0300399Z presto-master       |  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
2020-08-23T02:43:55.0300668Z presto-master       |  at java.security.AccessController.doPrivileged(Native Method)
2020-08-23T02:43:55.0300930Z presto-master       |  at javax.security.auth.Subject.doAs(Subject.java:422)
2020-08-23T02:43:55.0301225Z presto-master       |  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
2020-08-23T02:43:55.0301484Z presto-master       |  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2020-08-23T02:43:55.0301696Z presto-master       | 
2020-08-23T02:43:55.0301955Z presto-master       |  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
2020-08-23T02:43:55.0302216Z presto-master       |  at org.apache.hadoop.ipc.Client.call(Client.java:1457)
2020-08-23T02:43:55.0302476Z presto-master       |  at org.apache.hadoop.ipc.Client.call(Client.java:1367)
2020-08-23T02:43:55.0302773Z presto-master       |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
2020-08-23T02:43:55.0303066Z presto-master       |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
2020-08-23T02:43:55.0303329Z presto-master       |  at com.sun.proxy.$Proxy295.addBlock(Unknown Source)
2020-08-23T02:43:55.0303660Z presto-master       |  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:513)
2020-08-23T02:43:55.0303962Z presto-master       |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-08-23T02:43:55.0304273Z presto-master       |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-08-23T02:43:55.0304596Z presto-master       |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-08-23T02:43:55.0304870Z presto-master       |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2020-08-23T02:43:55.0305179Z presto-master       |  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
2020-08-23T02:43:55.0305494Z presto-master       |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
2020-08-23T02:43:55.0305802Z presto-master       |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
2020-08-23T02:43:55.0306112Z presto-master       |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
2020-08-23T02:43:55.0306415Z presto-master       |  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
2020-08-23T02:43:55.0306680Z presto-master       |  at com.sun.proxy.$Proxy296.addBlock(Unknown Source)
2020-08-23T02:43:55.0307244Z presto-master       |  at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
2020-08-23T02:43:55.0307596Z presto-master       |  at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1865)
2020-08-23T02:43:55.0307932Z presto-master       |  at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
2020-08-23T02:43:55.0308248Z presto-master       |  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-08-23T02:43:55.0308490Z presto-master       | 
2020-08-23T02:43:55.0308702Z presto-master       | 
2020-08-23T02:43:55.0582129Z tests               | 2020-08-23 08:28:55 INFO: FAILURE     /    io.prestosql.tests.hive.TestHiveCreateTable.testCreateTable (Groups: storage_formats) took 0.7 seconds
2020-08-23T02:43:55.0636265Z tests               | 2020-08-23 08:28:55 SEVERE: Failure cause:
2020-08-23T02:43:55.0636502Z tests               | io.prestosql.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20200823_024354_00034_zq34q): Error committing write to Hive
2020-08-23T02:43:55.0636726Z tests               |  at io.prestosql.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:114)
2020-08-23T02:43:55.0636895Z tests               |  at io.prestosql.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:82)
2020-08-23T02:43:55.0637068Z tests               |  at io.prestosql.tests.hive.TestHiveCreateTable.testCreateTable(TestHiveCreateTable.java:42)
2020-08-23T02:43:55.0637236Z tests               |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-08-23T02:43:55.0637410Z tests               |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-08-23T02:43:55.0637596Z tests               |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-08-23T02:43:55.0637758Z tests               |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2020-08-23T02:43:55.0637932Z tests               |  at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2020-08-23T02:43:55.0638090Z tests               |  at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
2020-08-23T02:43:55.0638241Z tests               |  at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
2020-08-23T02:43:55.0638392Z tests               |  at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
2020-08-23T02:43:55.0638550Z tests               |  at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
2020-08-23T02:43:55.0638711Z tests               |  at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
2020-08-23T02:43:55.0638874Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-08-23T02:43:55.0639025Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-08-23T02:43:55.0639185Z tests               |  at java.base/java.lang.Thread.run(Thread.java:834)
2020-08-23T02:43:55.0639353Z tests               | Caused by: java.sql.SQLException: Query failed (#20200823_024354_00034_zq34q): Error committing write to Hive
2020-08-23T02:43:55.0639526Z tests               |  at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1757)
2020-08-23T02:43:55.0639695Z tests               |  at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:225)
2020-08-23T02:43:55.0639856Z tests               |  at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:185)
2020-08-23T02:43:55.0640024Z tests               |  at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
2020-08-23T02:43:55.0640299Z tests               |  at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
2020-08-23T02:43:55.0640719Z tests               |  at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
2020-08-23T02:43:55.0640887Z tests               |  at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
2020-08-23T02:43:55.0641050Z tests               |  at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
2020-08-23T02:43:55.0641210Z tests               |  at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
2020-08-23T02:43:55.0641370Z tests               |  at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
2020-08-23T02:43:55.0641522Z tests               |  at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
2020-08-23T02:43:55.0641675Z tests               |  at io.prestosql.jdbc.PrestoResultSet$AsyncIterator.lambda$new$0(PrestoResultSet.java:131)
2020-08-23T02:43:55.0641816Z tests               |  at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
2020-08-23T02:43:55.0641947Z tests               |  ... 3 more
2020-08-23T02:43:55.0642082Z tests               | Caused by: io.prestosql.spi.PrestoException: Error committing write to Hive
2020-08-23T02:43:55.0642231Z tests               |  at io.prestosql.plugin.hive.orc.OrcFileWriter.commit(OrcFileWriter.java:155)
2020-08-23T02:43:55.0642375Z tests               |  at io.prestosql.plugin.hive.HiveWriter.commit(HiveWriter.java:86)
2020-08-23T02:43:55.0642520Z tests               |  at io.prestosql.plugin.hive.HivePageSink.doFinish(HivePageSink.java:190)
2020-08-23T02:43:55.0642684Z tests               |  at io.prestosql.plugin.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
2020-08-23T02:43:55.0642846Z tests               |  at java.base/java.security.AccessController.doPrivileged(Native Method)
2020-08-23T02:43:55.0642991Z tests               |  at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
2020-08-23T02:43:55.0643140Z tests               |  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
2020-08-23T02:43:55.0643296Z tests               |  at io.prestosql.plugin.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
2020-08-23T02:43:55.0643463Z tests               |  at io.prestosql.plugin.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
2020-08-23T02:43:55.0643622Z tests               |  at io.prestosql.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:89)
2020-08-23T02:43:55.0643751Z tests               |  at io.prestosql.plugin.hive.HivePageSink.finish(HivePageSink.java:181)
2020-08-23T02:43:55.0643909Z tests               |  at io.prestosql.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:77)
2020-08-23T02:43:55.0644276Z tests               |  at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:203)
2020-08-23T02:43:55.0644428Z tests               |  at io.prestosql.operator.Driver.processInternal(Driver.java:397)
2020-08-23T02:43:55.0644573Z tests               |  at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
2020-08-23T02:43:55.0644715Z tests               |  at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
2020-08-23T02:43:55.0644855Z tests               |  at io.prestosql.operator.Driver.processFor(Driver.java:276)
2020-08-23T02:43:55.0645004Z tests               |  at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
2020-08-23T02:43:55.0645158Z tests               |  at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
2020-08-23T02:43:55.0645458Z tests               |  at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
2020-08-23T02:43:55.0645609Z tests               |  at io.prestosql.$gen.Presto_1cc5db2____20200823_024202_2.run(Unknown Source)
2020-08-23T02:43:55.0645760Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-08-23T02:43:55.0645898Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-08-23T02:43:55.0646039Z tests               |  at java.base/java.lang.Thread.run(Thread.java:834)
2020-08-23T02:43:55.0646742Z tests               | Caused by: org.apache.hadoop.ipc.RemoteException: File /tmp/presto-hive/569fc3af-2a52-4a1b-85d4-38e844339646/20200823_024354_00034_zq34q_82a58b59-5216-45f7-84ed-c6af8b18f34a could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and no node(s) are excluded in this operation.
2020-08-23T02:43:55.0646969Z tests               |  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
2020-08-23T02:43:55.0647144Z tests               |  at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
2020-08-23T02:43:55.0647313Z tests               |  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
2020-08-23T02:43:55.0647482Z tests               |  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
2020-08-23T02:43:55.0647657Z tests               |  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
2020-08-23T02:43:55.0647837Z tests               |  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
2020-08-23T02:43:55.0648019Z tests               |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
2020-08-23T02:43:55.0648164Z tests               |  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
2020-08-23T02:43:55.0648304Z tests               |  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
2020-08-23T02:43:55.0648447Z tests               |  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
2020-08-23T02:43:55.0648589Z tests               |  at java.security.AccessController.doPrivileged(Native Method)
2020-08-23T02:43:55.0648729Z tests               |  at javax.security.auth.Subject.doAs(Subject.java:422)
2020-08-23T02:43:55.0648875Z tests               |  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
2020-08-23T02:43:55.0649018Z tests               |  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2020-08-23T02:43:55.0649142Z tests               | 
2020-08-23T02:43:55.0649272Z tests               |  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
2020-08-23T02:43:55.0649415Z tests               |  at org.apache.hadoop.ipc.Client.call(Client.java:1457)
2020-08-23T02:43:55.0649538Z tests               |  at org.apache.hadoop.ipc.Client.call(Client.java:1367)
2020-08-23T02:43:55.0649680Z tests               |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
2020-08-23T02:43:55.0649827Z tests               |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
2020-08-23T02:43:55.0649967Z tests               |  at com.sun.proxy.$Proxy295.addBlock(Unknown Source)
2020-08-23T02:43:55.0650123Z tests               |  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:513)
2020-08-23T02:43:55.0650286Z tests               |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-08-23T02:43:55.0650497Z tests               |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-08-23T02:43:55.0650699Z tests               |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-08-23T02:43:55.0650850Z tests               |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2020-08-23T02:43:55.0651005Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
2020-08-23T02:43:55.0651164Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
2020-08-23T02:43:55.0651322Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
2020-08-23T02:43:55.0651466Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
2020-08-23T02:43:55.0651626Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
2020-08-23T02:43:55.0651767Z tests               |  at com.sun.proxy.$Proxy296.addBlock(Unknown Source)
2020-08-23T02:43:55.0651908Z tests               |  at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
2020-08-23T02:43:55.0652057Z tests               |  at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1865)
2020-08-23T02:43:55.0652205Z tests               |  at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
2020-08-23T02:43:55.0652351Z tests               |  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)

https://github.com/prestosql/presto/pull/4846/checks?check_run_id=1017285394

findepi commented 4 years ago

TestHivePartitionsTable.testShowPartitionsFromHiveTableWithTooManyPartitions failed in https://github.com/prestosql/presto/runs/1205417196

2020-10-04T13:17:25.2762652Z tests               | 2020-10-04 19:02:25 INFO: FAILURE     /    io.prestosql.tests.hive.TestHivePartitionsTable.testShowPartitionsFromHiveTableWithTooManyPartitions (Groups: hive_partitioning) took 4.1 seconds
2020-10-04T13:17:25.2814457Z tests               | 2020-10-04 19:02:25 SEVERE: Failure cause:
2020-10-04T13:17:25.2841985Z tests               | io.prestosql.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20201004_131721_00842_pt8xg): Error committing write to Hive
2020-10-04T13:17:25.2844345Z tests               |  at io.prestosql.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:114)
2020-10-04T13:17:25.2846441Z tests               |  at io.prestosql.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:82)
2020-10-04T13:17:25.2848427Z tests               |  at io.prestosql.tempto.query.QueryExecutor.query(QueryExecutor.java:57)
2020-10-04T13:17:25.2850467Z tests               |  at io.prestosql.tests.hive.TestHivePartitionsTable.lambda$createPartitions$4(TestHivePartitionsTable.java:171)
2020-10-04T13:17:25.2852496Z tests               |  at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
2020-10-04T13:17:25.2853830Z tests               |  at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:593)
2020-10-04T13:17:25.2855806Z tests               |  at io.prestosql.tests.hive.TestHivePartitionsTable.createPartitions(TestHivePartitionsTable.java:168)
2020-10-04T13:17:25.2860649Z tests               |  at io.prestosql.tests.hive.TestHivePartitionsTable.testShowPartitionsFromHiveTableWithTooManyPartitions(TestHivePartitionsTable.java:135)
2020-10-04T13:17:25.2864788Z tests               |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-10-04T13:17:25.2867051Z tests               |  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-10-04T13:17:25.2869979Z tests               |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-10-04T13:17:25.2871800Z tests               |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2020-10-04T13:17:25.2873552Z tests               |  at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2020-10-04T13:17:25.2875345Z tests               |  at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
2020-10-04T13:17:25.2876760Z tests               |  at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
2020-10-04T13:17:25.2878192Z tests               |  at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
2020-10-04T13:17:25.2879962Z tests               |  at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
2020-10-04T13:17:25.2881716Z tests               |  at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
2020-10-04T13:17:25.2883415Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-10-04T13:17:25.2885438Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-10-04T13:17:25.2886677Z tests               |  at java.base/java.lang.Thread.run(Thread.java:834)
2020-10-04T13:17:25.2887530Z tests               | Caused by: java.sql.SQLException: Query failed (#20201004_131721_00842_pt8xg): Error committing write to Hive
2020-10-04T13:17:25.2889358Z tests               |  at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1761)
2020-10-04T13:17:25.2891644Z tests               |  at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:228)
2020-10-04T13:17:25.2893538Z tests               |  at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:188)
2020-10-04T13:17:25.2895798Z tests               |  at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
2020-10-04T13:17:25.2898171Z tests               |  at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
2020-10-04T13:17:25.2899986Z tests               |  at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
2020-10-04T13:17:25.2901951Z tests               |  at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
2020-10-04T13:17:25.2904351Z tests               |  at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
2020-10-04T13:17:25.2906729Z tests               |  at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
2020-10-04T13:17:25.2908940Z tests               |  at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
2020-10-04T13:17:25.2910514Z tests               |  at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
2020-10-04T13:17:25.2911849Z tests               |  at io.prestosql.jdbc.PrestoResultSet$AsyncIterator.lambda$new$0(PrestoResultSet.java:134)
2020-10-04T13:17:25.2993976Z tests               |  at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
2020-10-04T13:17:25.2994977Z tests               |  ... 3 more
2020-10-04T13:17:25.2995653Z tests               | Caused by: io.prestosql.spi.PrestoException: Error committing write to Hive
2020-10-04T13:17:25.2997022Z tests               |  at io.prestosql.plugin.hive.orc.OrcFileWriter.commit(OrcFileWriter.java:155)
2020-10-04T13:17:25.2998544Z tests               |  at io.prestosql.plugin.hive.HiveWriter.commit(HiveWriter.java:86)
2020-10-04T13:17:25.3000214Z tests               |  at io.prestosql.plugin.hive.HivePageSink.doFinish(HivePageSink.java:190)
2020-10-04T13:17:25.3002499Z tests               |  at io.prestosql.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
2020-10-04T13:17:25.3004672Z tests               |  at io.prestosql.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:96)
2020-10-04T13:17:25.3006219Z tests               |  at io.prestosql.plugin.hive.HivePageSink.finish(HivePageSink.java:181)
2020-10-04T13:17:25.3009043Z tests               |  at io.prestosql.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:77)
2020-10-04T13:17:25.3011991Z tests               |  at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:208)
2020-10-04T13:17:25.3013613Z tests               |  at io.prestosql.operator.Driver.processInternal(Driver.java:397)
2020-10-04T13:17:25.3014873Z tests               |  at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
2020-10-04T13:17:25.3016054Z tests               |  at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
2020-10-04T13:17:25.3017240Z tests               |  at io.prestosql.operator.Driver.processFor(Driver.java:276)
2020-10-04T13:17:25.3018776Z tests               |  at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
2020-10-04T13:17:25.3021073Z tests               |  at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
2020-10-04T13:17:25.3023252Z tests               |  at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
2020-10-04T13:17:25.3024393Z tests               |  at io.prestosql.$gen.Presto_84cbc9e____20201004_131106_2.run(Unknown Source)
2020-10-04T13:17:25.3025680Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-10-04T13:17:25.3027471Z tests               |  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-10-04T13:17:25.3028619Z tests               |  at java.base/java.lang.Thread.run(Thread.java:834)
2020-10-04T13:17:25.3031162Z tests               | Caused by: org.apache.hadoop.ipc.RemoteException: File /tmp/presto-hive/ddbad1e4-d67c-49bf-9b39-cd862cfe5cf2/part_col=3/20201004_131721_00842_pt8xg_a2bc03d2-4b6c-49a0-a481-1d61d15e62d5 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
2020-10-04T13:17:25.3034568Z tests               |  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
2020-10-04T13:17:25.3037791Z tests               |  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3508)
2020-10-04T13:17:25.3040562Z tests               |  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694)
2020-10-04T13:17:25.3044980Z tests               |  at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219)
2020-10-04T13:17:25.3050723Z tests               |  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507)
2020-10-04T13:17:25.3055612Z tests               |  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
2020-10-04T13:17:25.3058809Z tests               |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
2020-10-04T13:17:25.3060116Z tests               |  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
2020-10-04T13:17:25.3060951Z tests               |  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
2020-10-04T13:17:25.3061918Z tests               |  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
2020-10-04T13:17:25.3063033Z tests               |  at java.security.AccessController.doPrivileged(Native Method)
2020-10-04T13:17:25.3064328Z tests               |  at javax.security.auth.Subject.doAs(Subject.java:422)
2020-10-04T13:17:25.3065937Z tests               |  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
2020-10-04T13:17:25.3067545Z tests               |  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
2020-10-04T13:17:25.3068170Z tests               | 
2020-10-04T13:17:25.3069066Z tests               |  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
2020-10-04T13:17:25.3070300Z tests               |  at org.apache.hadoop.ipc.Client.call(Client.java:1457)
2020-10-04T13:17:25.3071324Z tests               |  at org.apache.hadoop.ipc.Client.call(Client.java:1367)
2020-10-04T13:17:25.3072679Z tests               |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
2020-10-04T13:17:25.3095596Z tests               |  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
2020-10-04T13:17:25.3096863Z tests               |  at com.sun.proxy.$Proxy241.addBlock(Unknown Source)
2020-10-04T13:17:25.3099220Z tests               |  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:513)
2020-10-04T13:17:25.3102226Z tests               |  at jdk.internal.reflect.GeneratedMethodAccessor411.invoke(Unknown Source)
2020-10-04T13:17:25.3104494Z tests               |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-10-04T13:17:25.3106385Z tests               |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2020-10-04T13:17:25.3108210Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
2020-10-04T13:17:25.3110631Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
2020-10-04T13:17:25.3112734Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
2020-10-04T13:17:25.3114910Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
2020-10-04T13:17:25.3117136Z tests               |  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
2020-10-04T13:17:25.3118629Z tests               |  at com.sun.proxy.$Proxy242.addBlock(Unknown Source)
2020-10-04T13:17:25.3119792Z tests               |  at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
2020-10-04T13:17:25.3121653Z tests               |  at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1865)
2020-10-04T13:17:25.3123766Z tests               |  at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
2020-10-04T13:17:25.3125482Z tests               |  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-10-04T13:17:25.3126222Z tests               | 
findepi commented 4 years ago

https://github.com/prestosql/presto/runs/1230122873

losipiuk commented 4 years ago

https://github.com/prestosql/presto/pull/5747

findepi commented 4 years ago

https://github.com/prestosql/presto/runs/1349777302

findepi commented 4 years ago

https://github.com/prestosql/presto/runs/1353726794

findepi commented 4 years ago

I marked two example failures with @Flaky -- https://github.com/prestosql/presto/pull/5811 and https://github.com/prestosql/presto/pull/5829. however, since the nature of the problem seems only so much related to actual test, we may need a more generic approach... unless, of course, we learn how to fix the problem.

findepi commented 3 years ago

@MiguelWeezardo would it be possible to add a retry to onHive() so that these are automatically retried without need for @Flaky on sooo many tests? (many are still not covered yet)

kokosing commented 3 years ago

it be possible to add a retry to onHive()

That should not be enabled by default. There are plenty of queries that should not be retried or they are simply not retryable.

How about.

onHive()
  .retryWhen(ex -> ex.getMessages().contains("Hive is flaky again...")
  .execute("select 1 from \"$dummy\"");

?

Another option is wrapping query executors like:


    /**
     * This method could be only used for state-less JDBC connections - i.e. Presto.
     * Otherwise connection will be surely broken.
     */
    public static QueryExecutor retryOnNetworkIssues(QueryExecutor queryExecutor)
    {
        RetryPolicy<QueryResult> queryRetryPolicy = new RetryPolicy<QueryResult>()
                .withBackoff(1, 60, ChronoUnit.SECONDS)
                .withMaxRetries(10)
                .handleIf(QueryExecutors::isNetworkIssue)
                .onRetry(event -> log.warn("Query failed on attempt %s, will retry. Exception: %s", event.getAttemptCount(), event.getLastFailure().getMessage()));

        return new QueryExecutor()
        {
            @Override
            public QueryResult executeQuery(String sql, QueryParam... params)
                    throws QueryExecutionException
            {
                return Failsafe.with(queryRetryPolicy)
                        .get(() -> queryExecutor.executeQuery(sql, params));
            }

            @Override
            public Connection getConnection()
            {
                return queryExecutor.getConnection();
            }

            @Override
            public void close()
            {
                queryExecutor.close();
            }
        };
    }

    private static boolean isNetworkIssue(Throwable throwable)
    {
        return Throwables.getCausalChain(throwable).stream()
                .anyMatch(exception -> exception instanceof SocketException
                        || exception instanceof SocketTimeoutException);
    }
findepi commented 3 years ago

That should not be enabled by default. There are plenty of queries that should not be retried or they are simply not retryable.

@kokosing i meant to only retry those that fail with https://github.com/trinodb/trino/blob/cea58426e0a8f4cfe330bc805f88d12ab2d3fc4e/testing/trino-product-tests/src/main/java/io/trino/tests/hive/HiveProductTest.java#L26-L28

plus, we can match on the query being done, eg it's not safe to retry INSERT statements

How about.

onHive()
  .retryWhen(ex -> ex.getMessages().contains("Hive is flaky again...")
  .execute("select 1 from \"$dummy\"");

if i were to change each test's code, i'd just continue with the current approach

@Flaky(issue = ERROR_COMMITTING_WRITE_TO_HIVE_ISSUE, match = ERROR_COMMITTING_WRITE_TO_HIVE_MATCH)
MiguelWeezardo commented 3 years ago

Should be fixed by https://github.com/trinodb/trino/pull/6601

findepi commented 3 years ago

Hopefully solved by https://github.com/trinodb/trino/pull/6601 thanks @MiguelWeezardo

if that turns out not to be sufficient in some cases, we may need a more focused approach in those cases, so let me close the issue

ksobolew commented 3 years ago

Well, I saw what looks like this in one of my PRs:

https://github.com/trinodb/trino/runs/1911881714?check_suite_focus=true#step:6:12515

MiguelWeezardo commented 3 years ago

Well, I saw what looks like this in one of my PRs:

https://github.com/trinodb/trino/runs/1911881714?check_suite_focus=true#step:6:12515

The Failsafe code is on the failure stack, but it looks like no retries were attempted. This might mean the regular expression failed to match the exception message.

kokosing commented 3 years ago

Is there any issues with message regex?

MiguelWeezardo commented 3 years ago

Is there any issues with message regex?

The regex is multiline, and focuses on messages from two different exceptions in the causal chain, which may be why it didn't match. HiveProductTest#isErrorCommittingToHive matches it against the getMessage() string of each exception in the causal chain. I assumed the getMessage() includes the messages of it's causes just as we see it in the logs, but this was probably wrong. I guess this code should generate the entire recursive stacktrace as a string and match against that.

MiguelWeezardo commented 3 years ago

Throwables#getStackTraceAsString might be useful here

findepi commented 3 years ago

The regex is multiline, and focuses on messages from two different exceptions in the causal chain, ...

That's why it has (?s:.*) and not just .*.

MiguelWeezardo commented 3 years ago

The regex is multiline, and focuses on messages from two different exceptions in the causal chain, ...

That's why it has (?s:.*) and not just .*.

I thought it's a simple non-capturing group, and failed to notice the s flag until now.

MiguelWeezardo commented 3 years ago

I copied the matching from FlakyTestRetryAnalizer in #6944

losipiuk commented 2 years ago

Happens also to Iceberg+Spark tests when trying to write to HDFS: https://github.com/trinodb/trino/runs/5280873825?check_suite_focus=true

findepi commented 1 year ago

We have retries in place, so no more work planned to be done here, I am closing the issue

Please reopen if you see this again. Be sure to provide failure link AND stacktraces. Github actions logs are not kept for ever.