iryndin / jdbf

Java utility to read/write DBF files
108 stars 77 forks source link

Dead loop in JDK1.8 when DBF file size is over Integer.MAX_VALUE(2^31 - 1) #55

Open mygodG opened 2 years ago

mygodG commented 2 years ago

Hello,

There is a dead loop problem in JDK 1.8 or newer version when jdbf loads DBF file which size is over Integer.MAX_VALUE(2^31 - 1).

Problem Description

Dead loop occurs at DbfMetadataUtils.readFields(), in this method, jdbf read the DBF header fields in a while loop, the main logic is:

  1. Read 32 bytes DBF header fields data
  2. Get the bytes that can be read from this input stream by calling InputStream.available() and save in local variable oldAvailable.
  3. Read 1 byte data and check if it’s “0x0D”(break out the while loop if true)
  4. Reset the read position to the position at the time InputStream.mark() method was last called, in here, mark() was called before readFields() in DbfReader.readMetadata() which means the position will reset to the beginning of the input stream.
  5. Skip the size readed in this loop, skip size is (InputStream.available() - oldAvailable)

The problem is the InputStream.available() method, which is overridden by BufferedInputStream. Check the source code:

public synchronized int available() throws IOException {
    int n = count - pos;
    int avail = getInIfOpen().available();
    return n > (Integer.MAX_VALUE - avail)
                ? Integer.MAX_VALUE
                : n + avail;
}

Obviously, when DBF size is over Integer.MAX_VALUE, available() always returns Integer.MAX_VALUE.

So actually the calculation in step.5 is (Integer.MAX_VALUE - Integer.MAX_VALUE) which means inputStream always skip 0 byte, reads the first 32 bytes data of the stream repeatedly and leads to dead loop.

Solution

To fix the problem, we should calculate the skip bytes in another way. Notice that there is a local variable headerLength defined and is used to record the readed bytes in while loop, it seems that inputStream.skip(headerLength) would be the solution.

However, readHeader() is called before readFields() in DbfReader.readMetadata(). Check readHeader() and we can find that it reads 32 bytes of the inputStream, so the skip bytes should includes the 32 bytes.

Finally, the fixed code in readFields() is:

public static void readFields(DbfMetadata metadata, InputStream inputStream) throws IOException {
    ...
    while (true) {
        ...

        //long oldAvailable = inputStream.available();  not needed anymore
        int terminator = inputStream.read();
        if (terminator == -1) {
            throw new IOException("The file is corrupted or is not a dbf file");
        } else if (terminator == JdbfUtils.HEADER_TERMINATOR) {
            break;
        } else {
            inputStream.reset();
            inpiutStream.skip(headerLength + JdbfUtils.FIELD_RECORD_LENGTH); //JdbfUtils.FIELD_RECORD_LENGTH defined in JdbfUtils and its value is 32
        }
    }
    ...
}

What's more

I found the jdbf works well before JDK 1.8, take JDK 1.6 for example. The reason why there is no dead loop problem in JDK 1.6 is because the BufferedInputStream.available() implementation in JDK 1.6 is different:

public synchronized int available() throws IOException {
    return getInIfOpen().available() + (count - pos);
}

There is no Integer.MAX_VALUE check in JDK 1.6, although the return type is int which doesn't matter the follow-up process.