qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
183 stars 74 forks source link

Inconsistent state in cache - Not reading data from cache #94

Open abhishekdas99 opened 6 years ago

abhishekdas99 commented 6 years ago

Imagine file1 in the local disk already partially(or completely cached)

ls -lrt File1* -rw-rw-rw- 1 yarn yarn 13631488 Feb 8 20:03 File1 -rw-r--r-- 1 yarn yarn 12 Feb 8 20:03 File1_mdfile

2018-02-08T20:04:30.046Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream         FileSize of remotePath : File1 is : 94607314
2018-02-08T20:04:30.087Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream         FileSize of remotePath : s3a://blah-bhal-path/File1.snappy.parquet is : 94607314
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Seek request, currentPos: 0 currentBlock: 0
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Seek to 94607306, setting block location 90
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Got Read, currentPos: 94607306 currentBlock: 90 bufferOffset: 0 length: 1048576
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream         Iterating from startBlock : 90 to End block : 92
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Sending cached block 90 to cachedReadRequestChain
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Request to add ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.091Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Added ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Reached EOF, returning
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Executing Chains
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Processing readrequest 94607306-94607314, length 8
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    CachedFileRead copied data [94607306 - 94607306] at buffer offset 0
2018-02-08T20:04:30.092Z        INFO    20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Read 0 bytes from cached file
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Read 0 bytes
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Got Read, currentPos: 94607306 currentBlock: 90 bufferOffset: 0 length: 1048576
2018-02-08T20:04:30.092Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream         Iterating from startBlock : 90 to End block : 92
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Sending cached block 90 to cachedReadRequestChain
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Request to add ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Added ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Reached EOF, returning
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Executing Chains
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Processing readrequest 94607306-94607314, length 8
2018-02-08T20:04:30.093Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    CachedFileRead copied data [94607306 - 94607306] at buffer offset 0
2018-02-08T20:04:30.093Z        INFO    20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Read 0 bytes from cached file
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Read 0 bytes
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Got Read, currentPos: 94607306 currentBlock: 90 bufferOffset: 0 length: 1048576
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream         Iterating from startBlock : 90 to End block : 92
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Sending cached block 90 to cachedReadRequestChain
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Request to add ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Added ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Reached EOF, returning
2018-02-08T20:04:30.094Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Executing Chains
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Processing readrequest 94607306-94607314, length 8
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    CachedFileRead copied data [94607306 - 94607306] at buffer offset 0
2018-02-08T20:04:30.095Z        INFO    20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Read 0 bytes from cached file
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Read 0 bytes
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Got Read, currentPos: 94607306 currentBlock: 90 bufferOffset: 0 length: 1048576
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream         Iterating from startBlock : 90 to End block : 92
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Sending cached block 90 to cachedReadRequestChain
2018-02-08T20:04:30.095Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Request to add ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.096Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.ReadRequestChain  Added ReadRequest: [94371840, 94607314, 94607306, 94607314, 0]
2018-02-08T20:04:30.096Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Reached EOF, returning
2018-02-08T20:04:30.096Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Executing Chains
2018-02-08T20:04:30.096Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Processing readrequest 94607306-94607314, length 8
2018-02-08T20:04:30.096Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    CachedFileRead copied data [94607306 - 94607306] at buffer offset 0
2018-02-08T20:04:30.096Z        INFO    20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachedReadRequestChain    Read 0 bytes from cached file
2018-02-08T20:04:30.096Z        DEBUG   20180208_200429_00310_rztrq.11.2-36-93  com.qubole.rubix.core.CachingInputStream        Read 0 bytes

As we can bookkeeper assumes the data is cached (from 94607306 to 94607314) tries to read from the cache but there is nothing in the file. So it will fill nothing in the input buffer.

In the bookeeper log we are getting this:

18/02/08 22:23:44,854 WARN pool-3-thread-10537 bookkeeper.BookKeeper: Could not cache data: org.apache.thrift.shaded.TException: java.io.IOException: No such file or directory
        at com.qubole.rubix.bookkeeper.BookKeeper.getCacheStatus(BookKeeper.java:154)
        at com.qubole.rubix.bookkeeper.BookKeeper.readData(BookKeeper.java:275)
        at com.qubole.rubix.spi.BookKeeperService$Processor$readData.getResult(BookKeeperService.java:456)
        at com.qubole.rubix.spi.BookKeeperService$Processor$readData.getResult(BookKeeperService.java:441)
        at org.apache.thrift.shaded.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.shaded.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.thrift.shaded.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No such file or directory
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:1006)
        at com.qubole.rubix.bookkeeper.FileMetadata.refreshBitmap(FileMetadata.java:93)
        at com.qubole.rubix.bookkeeper.FileMetadata.isBlockCached(FileMetadata.java:112)
        at com.qubole.rubix.bookkeeper.BookKeeper.getCacheStatus(BookKeeper.java:138)
        ... 9 more
shubhamtagra commented 6 years ago

Isnt the problem more about finding why we wrote nothing but still set the cached status to true for these blocks? Do we have logs from RemoteReadRequestChain caching this data and updating cache status?