prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.05k stars 5.38k forks source link

got "Filesystem closed" error sometimes. #15069

Open gavinljj opened 4 years ago

gavinljj commented 4 years ago

Dear team

I got "Filesystem closed" error sometimes. enviroment: Presto 0.237.1 with Kerberized Hive 3.1/HDFS 3.1.1. Two scenarios happened 1、 tracebacks:

com.facebook.presto.spi.PrestoException: Filesystem closed
    at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:227)
    at com.facebook.presto.hive.HiveRecordCursor.advanceNextPosition(HiveRecordCursor.java:175)
    at com.facebook.presto.$gen.CursorProcessor_20200729_120431_381.process(Unknown Source)
    at com.facebook.presto.operator.ScanFilterAndProjectOperator.processColumnSource(ScanFilterAndProjectOperator.java:248)
    at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:240)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:382)
    at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:284)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:277)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
    at com.facebook.presto.$gen.Presto_0_237_e369a5a____20200717_080617_1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Filesystem closed
    at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:860)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:926)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:238)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:193)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
    at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:209)
    ... 15 more
    Suppressed: java.io.UncheckedIOException: java.io.IOException: Filesystem closed
        at com.facebook.presto.hive.GenericHiveRecordCursor.close(GenericHiveRecordCursor.java:541)
        at com.facebook.presto.hive.HiveUtil.closeWithSuppression(HiveUtil.java:883)
        at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:223)
        ... 15 more
    Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
        at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:702)
        at java.io.FilterInputStream.close(FilterInputStream.java:181)
        at org.apache.hadoop.util.LineReader.close(LineReader.java:166)
        at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:284)
        at com.facebook.presto.hive.GenericHiveRecordCursor.close(GenericHiveRecordCursor.java:538)
        ... 17 more

2、 tracebacks

com.facebook.presto.spi.PrestoException: Failed to read ORC file: hdfs://xxxx
    at com.facebook.presto.hive.orc.OrcBatchPageSource.getNextPage(OrcBatchPageSource.java:166)
    at com.facebook.presto.hive.HivePageSource.getNextPage(HivePageSource.java:126)
    at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:272)
    at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:237)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:382)
    at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:284)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:277)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
    at com.facebook.presto.$gen.Presto_0_237_e369a5a____20200717_080617_1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.UncheckedIOException: java.io.IOException: Filesystem closed
    at com.facebook.presto.hive.orc.OrcBatchPageSource.close(OrcBatchPageSource.java:184)
    at com.facebook.presto.hive.orc.OrcBatchPageSource.getNextPage(OrcBatchPageSource.java:139)
    ... 14 more
Caused by: java.io.IOException: Filesystem closed
    at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
    at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:702)
    at java.io.FilterInputStream.close(FilterInputStream.java:181)
    at com.facebook.presto.hive.orc.HdfsOrcDataSource.close(HdfsOrcDataSource.java:56)
    at com.facebook.presto.orc.CachingOrcDataSource.close(CachingOrcDataSource.java:124)
    at com.google.common.io.Closer.close(Closer.java:214)
    at com.facebook.presto.orc.AbstractOrcRecordReader.close(AbstractOrcRecordReader.java:388)
    at com.facebook.presto.orc.OrcBatchRecordReader.close(OrcBatchRecordReader.java:39)
    at com.facebook.presto.hive.orc.OrcBatchPageSource.close(OrcBatchPageSource.java:181)

but retry , the same SQL can run successfully. My Question: Why did this happen? How to prevent it from happening again?

mbasmanova commented 4 years ago

There are some suggestions on SO: https://stackoverflow.com/questions/23779186/ioexception-filesystem-closed-exception-when-running-oozie-workflow