simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Log analysis after crash may not handle logs with truncated or corrupt records correctly #902

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
In LogReader.readPage(), we attempt to read pages even if they are underfull:

        try {
            fileChannel.position(readLSN % logFileSize);
            size = fileChannel.read(readBuffer);
        } catch (IOException e) {
            throw new ACIDException(e);
        }
        readBuffer.position(0);
        readBuffer.limit(size);
        bufferBeginLSN = readLSN;

in LogReader.next(), which reads the records out of the page, there is this 
check:

        if (readBuffer.position() == readBuffer.limit() || !logRecord.readLogRecord(readBuffer)) {
            readNextPage();
            if (!logRecord.readLogRecord(readBuffer)) {
                throw new IllegalStateException();
            }
        }

LogRecord.readLogRecord(readBuffer) can return false when either the record 
checksum is wrong, or the reader attempts to read past its buffer. If the end 
of the file has been truncated, and the last record is under-full but not 
corrupt, an IllegalStateException will be thrown and analysis will fail. 

The proposed fix for this is to simply delcare analysis "done" once an 
under-full log page is encountered, rather than trying to read into it. 

Original issue reported on code.google.com by ima...@uci.edu on 17 Jun 2015 at 10:54

GoogleCodeExporter commented 9 years ago
After more discussions, the plan is as follows:

1) Whenever a corrupt log record, or a truncated log record at the end of the 
log tail is encountered, we should declare this as "End of Log" rather than 
throwing an IllegalStateException. 

2) Handle a corrupt log record at the end of the log (last page) the same as 
one in the rest of the file, but if the corruption isn't at the tail, log this 
(as a message to the user). Corruption at the tail (e.x. after a crash) is less 
suspect than corruption in the body of the log. 

Original comment by ima...@uci.edu on 18 Jun 2015 at 10:28