qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
182 stars 74 forks source link

Exception in getCacheStatus #117

Open wishnick opened 6 years ago

wishnick commented 6 years ago

Setup: Rubix Version : 0.3.1 Presto : 0.172 Emr AMI: 4.9.3

I've installed rubix into presto by using a custom build of presto and overriding the configuration in presto's HdfsConfigurationUpdater to point to CachingPrestoS3FileSystem so it looks like the following:

        config.set("fs.s3.impl", CachingPrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", CachingPrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", CachingPrestoS3FileSystem.class.getName());

When I run it just like this I get the following errors in the presto logs:

2018-04-17T17:49:01.109Z        INFO    20180417_174858_00063_fngsp.1.0-0-54    com.qubole.rubix.spi.RetryingBookkeeperClient   Error while connecting :
org.apache.thrift.shaded.TApplicationException: getCacheStatus failed: unknown result
        at com.qubole.rubix.spi.BookKeeperService$Client.recv_getCacheStatus(BookKeeperService.java:109)
        at com.qubole.rubix.spi.BookKeeperService$Client.getCacheStatus(BookKeeperService.java:87)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.access$001(RetryingBookkeeperClient.java:30)
        at com.qubole.rubix.spi.RetryingBookkeeperClient$1.call(RetryingBookkeeperClient.java:53)
        at com.qubole.rubix.spi.RetryingBookkeeperClient$1.call(RetryingBookkeeperClient.java:48)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.retryConnection(RetryingBookkeeperClient.java:84)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.getCacheStatus(RetryingBookkeeperClient.java:47)
        at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:305)
        at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:231)
        at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:185)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
        at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
        at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:203)
        at com.facebook.presto.hive.HiveRecordCursor.advanceNextPosition(HiveRecordCursor.java:179)
        at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:99)
        at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:256)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:308)
        at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:239)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:542)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:234)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:458)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

2018-04-17T17:49:01.110Z        INFO    20180417_174858_00062_fngsp.1.0-0-60    com.qubole.rubix.core.CachingInputStream        Could not get cache status from server org.apache.thrift.shaded.TException
        at com.qubole.rubix.spi.RetryingBookkeeperClient.retryConnection(RetryingBookkeeperClient.java:95)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.getCacheStatus(RetryingBookkeeperClient.java:47)
        at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:305)
        at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:231)
        at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:185)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
        at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
        at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:203)
        at com.facebook.presto.hive.HiveRecordCursor.advanceNextPosition(HiveRecordCursor.java:179)
        at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:99)
        at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:256)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:308)
        at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:239)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:542)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:234)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:458)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I am able to make this go away by changing the following lines in PrestoClusterManager:

this becomes:

return ImmutableList.of(InetAddress.getLocalHost().getHostAddress());

and i added the following after this

if (!hosts.contains(InetAddress.getLocalHost().getHostAddress())) {
  hosts.add(InetAddress.getLocalHost().getHostAddress());
}
abhishekdas99 commented 6 years ago

@wishnick Is it possible to share the logs of bookkeeper daemon. That will include the actual cause of the exception. Presto client log will just say the exception is happening in getCacheStatus.