yakra / tmtools

Tools to aid in development of the TravelMapping project
0 stars 0 forks source link

DBFtrim: non-printable / control characters #5

Closed yakra closed 6 years ago

yakra commented 6 years ago
yakra@BiggaTomato:~/proj/tmtools/DBFtrim$ ./a.out ~/gis/data/ar/ROADS_ACF/ROADS_ACF.yOrig.dbf test.dbf

/home/yakra/gis/data/ar/ROADS_ACF/ROADS_ACF.yOrig.dbf opened.
DBF Filesize:   1121274838 (sanity check pass)
Number Records: 0x6b2f2 439026
Header Length:  0x981   2433
Record Length:  0x9fa   2554
First char: 0x3 3
Final char: 0x1a    26
75 fields.
Scanning DBF file...
439026/439026
FieldName   Type    Length  Max Data

OBJECTID    N   10  6   104735
pl_add_f    C   10  7   1901096
pl_add_t    C   10  8   14071400
pr_add_f    C   10  6   5780

pr_add_t    C   10  8   13121301
...

Confirmed via hex editor: There's a DOS CRLF (0x0D0A) following the 5780 datum, explaining why its length is 6 bytes. (There are 3 other such CRLFs in the pr_add_f field (I did not examine any others), all following 3-digit values.) Such non-printable / control characters are not a necessary piece of the information, contribute (slightly) to filesize bloat, and can cause wacky hijinks in the info display, as shown above.

The simplest solution: field.cpp, line 12: while (fVal[strlen(fVal)-1] == ' ') fVal[strlen(fVal)-1] = 0; -> while (fVal[strlen(fVal)-1] <= ' ' && fVal[strlen(fVal)-1] > 0) fVal[strlen(fVal)-1] = 0; lines 29 & 51: for (pad = 0; (fVal[pad] == ' ' || fVal[pad] == 0) && pad < len; pad++); -> for (pad = 0; (fVal[pad] <= ' ') && pad < len; pad++);

This alone would fix only the info display; the file output would still contain 0x0D or 0x0D0A: There are still proper 6-byte values of pr_add_f, E.G. 126000, so all 6 bytes would be copied, leading to { '5', '7', '8', '0', 0x0D, 0x0A }.

Suppose that the longest remaining values were instead 5 bytes. It would be written instead as { '5', '7', '8', '0', 0x0D }. Still though, no harm to info display if DBFtrim were to be run on the resulting file; 5780 would simply get interpreted as a length 4 string, thus not stored as MaxVal, and not displayed.

Would there be any negative effects of values ending in 0x0D stored in file? Doubtful. I bashed together a test in a hex editor; it loads the same in LibreOffice...

yakra commented 6 years ago

fixed in f72ee037ab57370bf0934413e9a3925792ef8017