iryndin / jdbf

Java utility to read/write DBF files
108 stars 77 forks source link

Better diagnostic of corrupted files #27

Closed jferard closed 7 years ago

jferard commented 8 years ago

There are some problems with corrupted files. Two examples : 1) Empty file a) Replace test/resources/data1/gds_im.dbf with an empty file of the same name b) Run mvn test The program throws a NullPointerException :

java.lang.NullPointerException
        at net.iryndin.jdbf.util.DbfMetadataUtils.parseHeaderUpdateDate(DbfMetadataUtils.java:67)
        at net.iryndin.jdbf.util.DbfMetadataUtils.fillHeaderFields(DbfMetadataUtils.java:56)
        at net.iryndin.jdbf.reader.DbfReader.readHeader(DbfReader.java:59)
        at net.iryndin.jdbf.reader.DbfReader.readMetadata(DbfReader.java:45)
        at net.iryndin.jdbf.reader.DbfReader.<init>(DbfReader.java:35)

Actually, the return value of dbfInputStream.read(bytes) in DbfReader.java (line 57) is not checked. I suggest to throw an IOException if that function returns less than the 16 expected bytes.

2) Small file a) Replace test/resources/data1/gds_im.dbf with a file of the same name, containing 16 times the 0x02 byte. b) Run mvn test The program runs in a infinite loop. The reason is almost the same as above : in DbfMetadataUtils.readFields, line 83, the return value of inputStream.read(fieldBytes) is not tested. It should return JdbfUtils.FIELD_RECORD_LENGTH, but it returns 0. At line 91, inputStream.read() will then return -1, which is different from JdbfUtils.HEADER_TERMINATOR, so the loop never breaks.

It's not purely theoretical : I ran into those bugs with empty or corrupted files.

iryndin commented 7 years ago

@jferard Could you please provide pull requests for this?

jferard commented 7 years ago

@iryndin this commit contains the fix for this issue, but I have to remove the JRE1.6 adaptation and some IDE auto formatting.

I will provide an appropriate pull request soon.

jferard commented 7 years ago

Hello.

I can't reopen the issue, but there is still a problem. I recently read carefully the javadoc for InputStream (https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read(byte[]) and one can't assume that the stream has ended (end of file) if read(b) returns less than b.length bytes. The read operation on the stream can just block. Therefore, the following test is not correct:

if (dbfInputStream.read(bytes) != HEADER_HALF_SIZE)
            throw new IOException("The file is corrupted or is not a dbf file");

And should be replaced by:

if (readFully(dbfInputStream, bytes) != HEADER_HALF_SIZE)
            throw new IOException("The file is corrupted or is not a dbf file");

Where the readFully method returns less than bytes.length if the end of file occured during the read operation and bytes.length otherwise. I'll provide a pull request as soon as possible.