tmontaigu / dbase-rs

Rust library to read & write dBase files.
MIT License
29 stars 30 forks source link

Decimal places while writing numeric values #18

Closed hblyp closed 3 years ago

hblyp commented 3 years ago

Hi! First of all I'm very grateful for this lib and also for the shapefile-rs lib!

I have an issue with the resulting DBF file when writing a shapefile containing point data and numeric field values. I was able to open the file with QGIS but then I noticed that all the attributes were missing.

First issue I found is the "Data record size" byte which for some reason when written by QGIS contains the sum of the field value lengths + 1. And without that +1 QGIS cannot parse the dbf file at all and displays no attributes. I have no clue how to fix this and I'm not sure if this is even an issue with this dbase-rs.

Second issue is with the decimal places. When a Numeric field is written and the number of valid decimal places of the number is less then specified by the FieldName, number of missing_decimals is calculated as:

let missing_decimals = field_info.num_decimal_places - (bytes_written - dot_pos as u64) as u8;

Since the dot_pos is the index of '.' in the number written to the buffer, shouldn't it be dot_pos + 1? I was able to get -1 missing decimal places which were then casted into u8 resulting in 255 zeros written to the dbf file after some numbers.

Another issue with the decimal places appears when I write a record with a whole number after a decimal number. When I tried to write a field containing 1 after some decimal number, the value I got was 1000000. I think that the problem is again with the dot_pos that returns an index of '.' that should not be found when writing an integer or should be 1 as for 1. but the index is the same as the previous decimal number. I suspect that only the first byte of the buffer is overwritten by the 1 and the rest stays there and is later used to determine the dot_pos.

tmontaigu commented 3 years ago

Hello,

First issue I found is the "Data record size" byte which for some reason when written by QGIS contains the sum of the field value lengths + 1. And without that +1 QGIS cannot parse the dbf file at all and displays no attributes. I have no clue how to fix this and I'm not sure if this is even an issue with this dbase-rs.

This missing byte is probably the 'deletion flag' field, which is a special 1 byte long field normally its the first field of a record. However I don't think that when writing a new dbase from scratch we actually ensures that this implicit field is written, I also wasn't sure if that field was mandatory when I first wrote this library, but it seems like it is.

Second issue is with the decimal places. When a Numeric field is written and the number of valid decimal places of the number is less then specified by the FieldName, number of missing_decimals is calculated as:

let missing_decimals = field_info.num_decimal_places - (bytes_written - dot_pos as u64) as u8;

Since the dot_pos is the index of '.' in the number written to the buffer, shouldn't it be dot_pos + 1? I was able to get -1 missing decimal places which were then casted into u8 resulting in 255 zeros written to the dbf file after some numbers.

You are very probably correct on that

I think that the problem is again with the dot_pos that returns an index of '.' that should not be found when writing an integer or should be 1 as for 1. but the index is the same as the previous decimal number. I suspect that only the first byte of the buffer is overwritten by the 1 and the rest stays there and is later used to determine the dot_pos.

I quicly looked at the code and indeed there is something that looks fishy:

To find the '.' we do:

 let mut maybe_dot_pos = self.buffer.get_ref().iter().position(|b| *b == b'.');

Which will search for the dot in the whole buffer, so yeah, it will find the dot of the previously written numeric if the current one does not have a dot, and one of the previous number did.

To be correct, this line should be:

 let mut maybe_dot_pos = self.buffer.get_ref()[..self.buffer.position()].iter().position(|b| *b == b'.');
hblyp commented 3 years ago

Hi, thanks for the quick reply. I'll submit a PR to fix the decimal places, however I'm not sure what to do with the deletion flag. But I will look into it.

tmontaigu commented 3 years ago

I can take care of looking into properly adding the deletion flag

tmontaigu commented 3 years ago

For the first issue, its almost certainly the deletion flag thing.

All records are preceded by this deletion flag, a byte to indicate if the record is to be considered deleted or not. This 'field' is implicit, it is not part of the field descriptor array written in the file. So I don't really think it should be counted in the 'data record size' however i'm not sure about that since the 'specifiation' available is not very clear on that.

hblyp commented 3 years ago

The QGIS dbase file I was comparing the result of dbase-rs to had actually num of records + 1 byte in the 'data record size' but I thought QGIS just wrote the records 1 byte larger. The deletion flag might be the answer to why the fields were 1 byte larger but I still have no clue why I cannot open the file in QGIS until I manually add that 1 byte record size to the header.