tmontaigu / dbase-rs

Rust library to read & write dBase files.
MIT License
29 stars 30 forks source link

Size of record doesn't include delete flag #74

Closed casperhart closed 12 months ago

casperhart commented 1 year ago

Switching from Reader to File gives different behaviour when calculating record offsets, so a file that's read correctly with Reader is read incorrectly using File. For example the following uses Reader:

+---------------+-------------+----------------------------------------+
| isbn          | catalog     | title                                  |
+---------------+-------------+----------------------------------------+
| 021075100426  |             | JANES ADDICTION                        |
| 9325583035043 | 5046759602  | IF I SHOULD FALL FROM GRACE WITH GOD   |
| 602527303246  |             | BLACK SABBATH (2010 REMASTERED) (DIGI) |
| 743211835626  | Z48990      | TRINITY SESSION                        |
| 731458695424  |             | DANZIG 2 - LUCIFUGE                    |
| 720642472521  | GEFD24725   | HELL FREEZES OVER                      |
| 078221864626  |             | BREATHLESS                             |
| 008811099725  | 069905      | THROWING COPPER                        |
| 743211913928  | 74321191392 | PARIS                                  |
| 078636648026  |             | INFAMOUS                               |
+---------------+-------------+----------------------------------------+

But then switching to File I get:

+----------------+-----------------+--------------------------------------+
| isbn           | catalog         | title                                |
+----------------+-----------------+--------------------------------------+
| 021075100426   |                 | JANES ADDICTION                      |
| 325583035043 5 | 046759602     I | F I SHOULD FALL FROM GRACE WITH GOD  |
| 2527303246     | BL              | ACK SABBATH (2010 REMASTERED) (DIGI) |
| 211835626  Z48 | 990         TRI | NITY SESSION                         |
| 58695424       | DANZ            | IG 2 - LUCIFUGE                      |
| 2472521  GEFD2 | 4725      HELL  | FREEZES OVER                         |
| 864626         | BREATH          | LESS                                 |
| 99725  069905  | THROWIN         | G COPPER                             |
| 3928  74321191 | 392    PARIS    | MCLAREN                              |
| 026            | INFAMOUS        | MOBB DEEP                            |
+----------------+-----------------+--------------------------------------+

This is because Reader calculates record size as the following (excluding deletion flag) which is correct in all cases:

let record_size: usize = self
            .fields_info
            .iter()
            .map(|i| i.field_length as usize)
            .sum();

Whereas File gets the record size from the header, and then adds the deletion flag size when calculating the record offset. But the record size in the header already includes the deletion flag size, so I get an off-by-one issue with my FoxPro 2.x file with memo.

This doc suggests the record size in the header includes the deletion flag size, so I believe my file is in spec: https://www.dbf2002.com/dbf-file-format.html. But I don't know if this is the case for all dbase files, or just some types like foxpro? Is there a different spec that suggests otherwise?

The fix for this is very easy, just remove + DELETION_FLAG_SIZE from here: https://github.com/tmontaigu/dbase-rs/blob/ddb11cbae4086f99cf10fbf55b2885fc2adfcc84/src/header.rs#L416, but I don't want to break other's code.

tmontaigu commented 1 year ago

This implicit but not so implicit deletion flag has caused me some troubles in the past.

For example the reader's computation does not include the deletion flag size, but that's ok because it reads it and skip the rest of the record if its marked deleted.

https://github.com/tmontaigu/dbase-rs/pull/75 Should be the correct fix for the dbase::File

tmontaigu commented 12 months ago

While trying to finis #77, I found that even the newly introduced stations.dbf does not have its header's size_of_record include the size of deletion flag.

So propably recomputing the record_size is safer than trusting the header.

casperhart commented 12 months ago

Are you sure? The new one has the size of record bytes = F903 = 1017 which looks correct (4 fields @ 254 bytes each + deletion flag). The old one was 1016.

tmontaigu commented 12 months ago

Ah yes you are right, I got confused, its the stations_with_deleted.dbf that is not correct