yakra / tmtools

Tools to aid in development of the TravelMapping project
0 stars 0 forks source link

DBFtrim: trim extraneous trailing zeros from type N fields #4

Closed yakra closed 6 years ago

yakra commented 6 years ago

Massachusetts:

Before:

yakra@BiggaTomato:~/proj/chm/DBFtrim$ ./20171106 ~/gis/data/ma/RoadInv2017/Road_Inventory.yCull.dbf mass-after

/home/yakra/gis/data/ma/RoadInv2017/Road_Inventory.yCull.dbf opened.
DBF Filesize:   890794281 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x87ab8 555704
Header Length:  0x301   769
Record Length:  0x643   1603
First char: 0x3 3
Final char: 0x30    48
23 fields.
Scanning DBF file...
555704/555704
FieldName   Type    Length  Max Data
OBJECTID    N   10  6   100000
City        C   80  3   148
County      C   80  2   11
MPO     C   80  26  Southeastern Massachusetts
MHS     C   80  1   1
Route_ID    C   80  11  I190-DUP NB
From_Measu  N   24  19  100.339752590000003
To_Measure  N   24  19  100.339752590000003
F_Class     C   80  1   0
F_F_Class   C   80  1   7
St_Name     C   80  53  RAMP-RT 3 SB/LEVERETT CIR TO RTS 93 NB/1 NB CONNECTOR
Fm_St_Name  C   80  51  RAMP-RTS 93 NB/3 SB/BURGIN PKWY SB TO WASHINGTON ST
To_St_Name  C   80  51  RAMP-RTS 93 NB/3 SB/BURGIN PKWY SB TO WASHINGTON ST
Operation   C   80  1   2
Toll_Road   C   80  1   1
AADT        C   80  6   102194
AADT_Year   C   80  4   2015
Route_Syst  C   80  2   SR
Route_Numb  C   80  7   190-DUP
Route_Dire  C   80  3    NB
Rd_Seg_ID   C   80  6   440307
Speed       C   80  2   45
Length      N   24  18  14.137930360000000
Saving trimmed file...
555704/555704
yakra@BiggaTomato:~/proj/chm/DBFtrim$ diff mass-before mass-after 

After:

yakra@BiggaTomato:~/proj/chm/DBFtrim$ ./a.out ~/gis/data/ma/RoadInv2017/Road_Inventory.yCull.dbf mass-after

/home/yakra/gis/data/ma/RoadInv2017/Road_Inventory.yCull.dbf opened.
DBF Filesize:   890794281 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x87ab8 555704
Header Length:  0x301   769
Record Length:  0x643   1603
First char: 0x3 3
Final char: 0x30    48
23 fields.
Scanning DBF file...
555704/555704
FieldName   Type    Length  Max Data
OBJECTID    N   10  6   100000
City        C   80  3   148
County      C   80  2   11
MPO     C   80  26  Southeastern Massachusetts
MHS     C   80  1   1
Route_ID    C   80  11  I190-DUP NB
From_Measu  N   24  19  100.339752590000003
To_Measure  N   24  19  100.339752590000003
F_Class     C   80  1   0
F_F_Class   C   80  1   7
St_Name     C   80  53  RAMP-RT 3 SB/LEVERETT CIR TO RTS 93 NB/1 NB CONNECTOR
Fm_St_Name  C   80  51  RAMP-RTS 93 NB/3 SB/BURGIN PKWY SB TO WASHINGTON ST
To_St_Name  C   80  51  RAMP-RTS 93 NB/3 SB/BURGIN PKWY SB TO WASHINGTON ST
Operation   C   80  1   2
Toll_Road   C   80  1   1
AADT        C   80  6   102194
AADT_Year   C   80  4   2015
Route_Syst  C   80  2   SR
Route_Numb  C   80  7   190-DUP
Route_Dire  C   80  3    NB
Rd_Seg_ID   C   80  6   440307
Speed       C   80  2   45
Length      N   24  18  11.329686779999999
Saving trimmed file...
555704/555704
yakra@BiggaTomato:~/proj/chm/DBFtrim$ diff mass-before mass-after 

Note that Before, Length = 14.137930360000000 After, Length = 11.329686779999999

This is working normally, as expected. There is no DIFF between the files, as no zeros could be trimmed. Why the new reported datum for Length? Simple: • Less the trailing zeros, "14.13793036" has a length of 11, which gets saved to len. • When "11.329686779999999" is first processed, there are no zeros to trim, thus MinEx0 gets set to 0. • Existing value of len + new MinEx0 = 11. • strlen("11.329686779999999") = 18; 18 > 11; len & MaxVal get updated accordingly. • Think of it this way: 14.137930360000000 would be expressed under the new system as 14.13793036, which is clearly a shorter string than 11.329686779999999 • (Whether or not there were any intermediate steps is immaterial; the same principle still holds.)

MA: no filesize savings

yakra commented 6 years ago

Maine (e911rds, pre-trimmed):

Before:

yakra@BiggaTomato:~/proj/chm/DBFtrim$ ./20171106 ~/gis/data/me/e911rdss/e911rds.dbf ~/gis/data/me/e911rdss/e911rds.yTrimF.dbf

/home/yakra/gis/data/me/e911rdss/e911rds.dbf opened.
DBF Filesize:   56953081 (sanity check pass)
Number Records: 0x21fef 139247
Header Length:  0x421   1057
Record Length:  0x199   409
First char: 0x3 3
Final char: 0x1a    26
32 fields.
Scanning DBF file...
139247/139247
FieldName   Type    Length  Max Data
SOURCE      N   2   2   18
E911        N   1   1   1
PREDIR      C   2   2   NW
STREETNAME  C   31  31  Rocky Brook / Rocky Brook North
SUFFIX      C   4   4   Loop
POSTDIR     C   2   2   NW
RDNAME      C   34  34  Rocky Brook / Rocky Brook North Rd
L_ADD_FROM  N   13  13  1410.00000000
L_ADD_TO    N   13  13  1568.00000000
R_ADD_FROM  N   13  13  1829.00000000
R_ADD_TO    N   13  13  1859.00000000
TOWN        C   32  32  North Yarmouth Academy Grant Twp
READD       N   1   1   0
LCITY       C   35  35  BOWDOIN COLLEGE GRANT EAST TOWNSHIP
RCITY       C   35  35  BOWDOIN COLLEGE GRANT EAST TOWNSHIP
LESN        C   4   4   8888
RESN        C   4   4   9999
LCOUNTY     C   12  12  Androscoggin
RCOUNTY     C   12  12  Androscoggin
LPOSTOFFIC  C   25  25  Northern Piscataquis CNTY
RPOSTOFFIC  C   25  25  Northern Piscataquis CNTY
LZIPCODE    C   5   5   04027
RZIPCODE    C   5   5   04027
PVT     C   1   1   N
AMUPDDAT    D   8   8     <Type D fields unsupported>
AMUPDORG    C   10  10  megis911bw
FMUPDDAT    D   8   8     <Type D fields unsupported>
FMUPDORG    C   10  10  megis911er
LGEOCODE    C   5   5   31230
RGEOCODE    C   5   5   31230
ROUTE_NUM   C   20  20  201/202/11/17/27/100
SHAPE_len   F   18  18  2.86982165901e+002
Saving trimmed file...
139247/139247
yakra@BiggaTomato:~/proj/chm/DBFtrim$ diff ~/gis/data/me/e911rdss/e911rds.dbf ~/gis/data/me/e911rdss/e911rds.yTrimF.dbf

After:

yakra@BiggaTomato:~/proj/chm/DBFtrim$ ./a.out ~/gis/data/me/e911rdss/e911rds.dbf ~/gis/data/me/e911rdss/e911rds.yTrim0.dbf

/home/yakra/gis/data/me/e911rdss/e911rds.dbf opened.
DBF Filesize:   56953081 (sanity check pass)
Number Records: 0x21fef 139247
Header Length:  0x421   1057
Record Length:  0x199   409
First char: 0x3 3
Final char: 0x1a    26
32 fields.
Scanning DBF file...
139247/139247
FieldName   Type    Length  Max Data
SOURCE      N   2   2   18
E911        N   1   1   1
PREDIR      C   2   2   NW
STREETNAME  C   31  31  Rocky Brook / Rocky Brook North
SUFFIX      C   4   4   Loop
POSTDIR     C   2   2   NW
RDNAME      C   34  34  Rocky Brook / Rocky Brook North Rd
L_ADD_FROM  N   13  4   1410
L_ADD_TO    N   13  4   1568
R_ADD_FROM  N   13  4   1829
R_ADD_TO    N   13  4   1859
TOWN        C   32  32  North Yarmouth Academy Grant Twp
READD       N   1   1   0
LCITY       C   35  35  BOWDOIN COLLEGE GRANT EAST TOWNSHIP
RCITY       C   35  35  BOWDOIN COLLEGE GRANT EAST TOWNSHIP
LESN        C   4   4   8888
RESN        C   4   4   9999
LCOUNTY     C   12  12  Androscoggin
RCOUNTY     C   12  12  Androscoggin
LPOSTOFFIC  C   25  25  Northern Piscataquis CNTY
RPOSTOFFIC  C   25  25  Northern Piscataquis CNTY
LZIPCODE    C   5   5   04027
RZIPCODE    C   5   5   04027
PVT     C   1   1   N
AMUPDDAT    D   8   8     <Type D fields unsupported>
AMUPDORG    C   10  10  megis911bw
FMUPDDAT    D   8   8     <Type D fields unsupported>
FMUPDORG    C   10  10  megis911er
LGEOCODE    C   5   5   31230
RGEOCODE    C   5   5   31230
ROUTE_NUM   C   20  20  201/202/11/17/27/100
SHAPE_len   F   18  18  2.86982165901e+002
Saving trimmed file...
139247/139247

Four cases where all zeros, and thus the decimal point itself, were extraneous. Success. 36 bytes * 139247 records = 5 megs saved right there. Not bad. (57.0 -> 51.9 MB)

yakra commented 6 years ago

Maine (medodpubrdss):

Before:

yakra@BiggaTomato:~/proj/chm/DBFtrim$ ./20171106 ~/gis/data/me/medotpubrdss/2016-04-08/medotpubrdss.yOrig.dbf medotpubrdss.yTrimF.dbf

/home/yakra/gis/data/me/medotpubrdss/2016-04-08/medotpubrdss.yOrig.dbf opened.
DBF Filesize:   727398694 (sanity check pass)
Number Records: 0x3787e 227454
Header Length:  0x321   801
Record Length:  0xc7e   3198
First char: 0x3 3
Final char: 0x1a    26
24 fields.
Scanning DBF file...
227454/227454
FieldName   Type    Length  Max Data
SEGMENT_ID  N   10  7   3032722
ELEMENT_ID  N   10  7   2038036
FED_FUNCTI  C   254 20  Other princ arterial
FED_FC      N   10  1   0
JURISDICTI  C   254 11  Tnwy summer
JURIS_ABBR  C   254 4   TNWY
RTE_SYSTEM  C   254 14  INVENTORY ROAD
PRIM_RTE    C   254 7   0100509
RTE_NO      C   254 7   0100509
RTE_SUFFIX  C   254 1   A
PRIM_BMP    N   19  12  103.82000000
PRIM_EMP    N   19  12  103.84000000
FACT_AADT   N   10  5   13490
ANNUAL_VMT  N   10  7   1611592
SPEED_LIM_  N   10  2   35
SPEED_LIM1  C   254 7   Default
TOWN_NAME   C   254 32  North Yarmouth Academy Grant Twp
TOWN_CODE   C   254 5   01030
COUNTY_NAM  C   254 12  Androscoggin
CNTY        C   254 2   01
SEG_LEN__M  N   19  11  10.17000008
DATE_MOD    D   8   8     <Type D fields unsupported>
PRIORITY    N   5   1   6
Shape_len   F   19  18  2.41662950308e+002
Saving trimmed file...
227454/227454

After:

yakra@BiggaTomato:~/proj/chm/DBFtrim$ ./a.out ~/gis/data/me/medotpubrdss/2016-04-08/medotpubrdss.yOrig.dbf medotpubrdss.yTrim0.dbf

/home/yakra/gis/data/me/medotpubrdss/2016-04-08/medotpubrdss.yOrig.dbf opened.
DBF Filesize:   727398694 (sanity check pass)
Number Records: 0x3787e 227454
Header Length:  0x321   801
Record Length:  0xc7e   3198
First char: 0x3 3
Final char: 0x1a    26
24 fields.
Scanning DBF file...
227454/227454
FieldName   Type    Length  Max Data
SEGMENT_ID  N   10  7   3032722
ELEMENT_ID  N   10  7   2038036
FED_FUNCTI  C   254 20  Other princ arterial
FED_FC      N   10  1   0
JURISDICTI  C   254 11  Tnwy summer
JURIS_ABBR  C   254 4   TNWY
RTE_SYSTEM  C   254 14  INVENTORY ROAD
PRIM_RTE    C   254 7   0100509
RTE_NO      C   254 7   0100509
RTE_SUFFIX  C   254 1   A
PRIM_BMP    N   19  6   103.82
PRIM_EMP    N   19  6   103.84
FACT_AADT   N   10  5   13490
ANNUAL_VMT  N   10  7   1611592
SPEED_LIM_  N   10  2   35
SPEED_LIM1  C   254 7   Default
TOWN_NAME   C   254 32  North Yarmouth Academy Grant Twp
TOWN_CODE   C   254 5   01030
COUNTY_NAM  C   254 12  Androscoggin
CNTY        C   254 2   01
SEG_LEN__M  N   19  11  10.17000008
DATE_MOD    D   8   8     <Type D fields unsupported>
PRIORITY    N   5   1   6
Shape_len   F   19  18  2.41662950308e+002
Saving trimmed file...
227454/227454

Trimming several zeros; leaving a couple sig figs and the decimal point. 2.7 MB saved.

New output format:

FieldName   Type    Length  Max Data
L_ADD_FROM  N   13  4   1410 <- 1410.00000000
L_ADD_TO    N   13  4   1568 <- 1568.00000000
R_ADD_FROM  N   13  4   1829 <- 1829.00000000
R_ADD_TO    N   13  4   1859 <- 1859.00000000
FieldName   Type    Length  Max Data
PRIM_BMP    N   19  6   103.82 <- 103.82000000
PRIM_EMP    N   19  6   103.84 <- 103.84000000

Yippee Skippee

yakra commented 6 years ago

PE: no add'l filesize savings NH: no add'l filesize savings MD: no add'l filesize savings AR: no add'l filesize savings (ROADS_ACF)