yakra / tmtools

Tools to aid in development of the TravelMapping project
0 stars 0 forks source link

corrupt field data in Pennsylvania_Local_Roads.dbf #26

Closed yakra closed 6 years ago

yakra commented 6 years ago

DBFtrim cannot trim Pennsylvania_Local_Roads.dbf. The field data cannot be read correctly. This is not a new problem; it shows up in one form or another since my earliest saved DBFtrim executables.

yakra commented 6 years ago

The easternmost segment of N College Ave in Philadelphia, as reported by QGIS:

LRS_ID: 371357
CTY_CODE: 67
DISTRICT_N: 06
MUN_ID: 2641
MUNICIPAL_: 67301
MUNICIPA_1: PHILADELPHIA
LR_NAME: COLLEGE
LR_TYPE: AV
LR_ID: 148619
SEGMENT_NU: 100
OFFSET_BGN: 2691
OFFSET_END: 2955
SEG_LNGTH_: 264
CUM_OFFSET: 2691
CUM_OFFS_1: 2955
CUM_SEG_LN: 2955
BEG_INT_TE: CAPITOL
BEG_INT__1: ST
BEGIN_ROUT: NULL
BEGIN_LATI: 395830.9000
BEGIN_LONG: 751005.8000
BEGIN_SEGM: 0
BEGIN_STAT: 0
BEGIN_OFFS: 0
END_TERMIN: RIDGE
END_TERM_T: AV
END_ROUTE_: NULL
END_LATITU: 395830.6000
END_LONGIT: 751009.0000
END_SEGMEN: 0
END_STATE_: 0
END_OFFSET: 0
SEGMENT_LE: 0.050000000000000
UNIMPROVED: NULL
GRAVEL_MIL: NULL
SEAL_COATE: NULL
BITUMINOUS: 0.050000000000000
BRICK_MILE: NULL
CONCRETE_M: NULL
EST_AVERAG: NULL
RIGHT_OF_W: NULL
CART_WAY_W: NULL
FEDERAL_AI: 4
DIRECTION_: 2
FEDERAL__1: 0
BRIDGE_COU: NULL
RAIL_ROAD_: NULL
TRAFFIC_CO: 2
TRAFFIC__1: NULL
FUNCTIONAL: 7
FEDERAL__2: NULL
TRUCK_DAIL: NULL
TURNBACK_I: F
FIPS_AREA_: 69076
ROAD_OWNER: NULL
TYPE_OF_AR: NULL
TRAFFIC_PA: NULL
STRUCTURAL: NULL
LIQUID_FUE: NULL
POSTED_BON: NULL
NO_OF_LANE: NULL
SHOULDER_T: NULL
SHOULDER_W: NULL
SPEED_LIMI: NULL
PARKING_LA: NULL
BIFURCATED: NULL
MEDIAN_TYP: NULL
CURBS: NULL
SIDEWALKS: NULL
RTE_TNUM: NULL
ID: 371357
LIQUID_F_1: TRUE
NSFA_IND: FALSE
SURFACE_TY: NULL
ROAD_TYPE: R
NOTE: NULL
LR_RT_NO: ETZM
DG_CREATE_: NULL
DG_CREAT_1: NULL
DG_UPDATE_: 2015-08-26T00:00:00.000Z
DG_UPDAT_1: DOT_GRANTS
GIS_UPDATE: NULL
yakra commented 6 years ago

20180312 info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10  ##10369000
CTY_CODE    C   80  80  7-16T00:00:00.000Z                                                        PennDO
DISTRICT_N  C   80  80  07-16T00:00:00.000Z                                                        PennD
MUN_ID      N   10  10  **********
MUNICIPAL_  C   80  80  D                                                                              *
MUNICIPA_1  C   80  80  OT                                                                         2013-
LR_NAME     C   80  80  E                                                                           FALS
LR_TYPE     C   80  80  SE                                                                           FAL
LR_ID       N   10  10  **********
SEGMENT_NU  N   10  10  **********
OFFSET_BGN  N   10  10  **********
OFFSET_END  N   10  10  **********
SEG_LNGTH_  N   10  10  **********
CUM_OFFSET  N   10  10  **********
CUM_OFFS_1  N   10  10  **********
CUM_SEG_LN  N   10  10  **********
BEG_INT_TE  C   80  80  ALSE                                                                           F
BEG_INT__1  C   80  80  1                                                                              *
BEGIN_ROUT  C   80  80  ALSE                                                                           F
BEGIN_LATI  C   80  80  13T00:00:00.000Z                                                        PDLEBXPL
BEGIN_LONG  C   80  80  -13T00:00:00.000Z                                                        PDLEBXP
BEGIN_SEGM  C   80  80  MMON                                                                           A
BEGIN_STAT  N   10  10  **********
BEGIN_OFFS  N   10  10  **********
END_TERMIN  C   80  80  ALSE                                                                           F
END_TERM_T  C   80  80  -07-16T00:00:00.000Z                                                        Penn
END_ROUTE_  C   80  80  E                                                                           FALS
END_LATITU  C   80  80  SE                                                                           FAL
END_LONGIT  C   80  80  LSE                                                                           FA
END_SEGMEN  N   10  10  **********
END_STATE_  C   80  80  LSE                                                                           FA
END_OFFSET  N   10  10  **********
SEGMENT_LE  N   24  24  #######0.200000000000000
UNIMPROVED  C   80  80  ALSE                                                                           F
GRAVEL_MIL  C   80  80  8-24T00:00:00.000Z                                                        PennDO
SEAL_COATE  C   80  80  T                                                                         2013-0
BITUMINOUS  N   24  24  #######0.000000000000000
BRICK_MILE  C   80  80  LSE                                                                           FA
CONCRETE_M  C   80  80  ALSE                                                                           F
EST_AVERAG  N   10  10  **********
RIGHT_OF_W  N   10  10  **********
CART_WAY_W  N   10  10  **********
FEDERAL_AI  C   80  80  SE                                                                           FAL
DIRECTION_  C   80  80  LSE                                                                           FA
FEDERAL__1  C   80  80  ALSE                                                                           F
BRIDGE_COU  C   80  80  3                                                                              *
RAIL_ROAD_  C   80  80  ALSE                                                                           F
TRAFFIC_CO  C   80  80  8                                                                              *
TRAFFIC__1  C   80  80  ALSE                                                                           F
FUNCTIONAL  N   10  10  **********
FEDERAL__2  C   80  80  ALSE                                                                           F
TRUCK_DAIL  C   80  80  RIVATE                                                                         R
TURNBACK_I  C   80  80  E                                                                           FALS
FIPS_AREA_  C   80  80  *******        60************************************************************WAS
ROAD_OWNER  C   80  80  E                                                                           FALS
TYPE_OF_AR  C   80  80  7                                                                              0
TRAFFIC_PA  C   80  80  1                                                                              *
STRUCTURAL  C   80  80  ALSE                                                                           F
LIQUID_FUE  C   80  80  3                                                                              *
POSTED_BON  C   80  80  ALSE                                                                           F
NO_OF_LANE  C   80  80  9                                                                              *
SHOULDER_T  C   80  80  ********************************************************************************
SHOULDER_W  C   80  80  9                                                                              *
SPEED_LIMI  C   80  80  E                                                                           FALS
PARKING_LA  C   80  80  SE                                                                           FAL
BIFURCATED  C   80  80  LSE                                                                           FA
MEDIAN_TYP  C   80  80  ALSE                                                                           F
CURBS       C   80  80  2                                                                              *
SIDEWALKS   C   80  80  ALSE                                                                           F
RTE_TNUM    C   80  80  9                                                                              *
ID      N   10  10  ##10369000
LIQUID_F_1  C   80  80  7-16T00:00:00.000Z                                                        PennDO
NSFA_IND    C   80  80  07-16T00:00:00.000Z                                                        PennD
SURFACE_TY  C   80  80  -07-16T00:00:00.000Z                                                        Penn
ROAD_TYPE   C   80  80  E                                                                           FALS
NOTE        C   89  89  ***                                                                                ******
LR_RT_NO    C   80  80  LSE                                                                           FA
DG_CREATE_  C   80  80  ALSE                                                                           F
DG_CREAT_1  C   80  80  9                                                                              *
DG_UPDATE_  C   80  80  ALSE                                                                           F
DG_UPDAT_1  C   80  80  9                                                                              *
GIS_UPDATE  C   80  80  LY CROSS                                                                      AV
yakra commented 6 years ago

20180311 info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10  ##10369000
CTY_CODE    C   80  80                                                                        **********
DISTRICT_N  C   80  80                                                                         *********
MUN_ID      N   10  10  **********
MUNICIPAL_  C   80  80                                                                             *****
MUNICIPA_1  C   80  80                                                                              ****
LR_NAME     C   80  80                                                                               ***
LR_TYPE     C   80  80                                                                           *******
LR_ID       N   10  10  **********
SEGMENT_NU  N   10  10  **********
OFFSET_BGN  N   10  10  **********
OFFSET_END  N   10  10  **********
SEG_LNGTH_  N   10  10  **********
CUM_OFFSET  N   10  10  **********
CUM_OFFS_1  N   10  10  **********
CUM_SEG_LN  N   10  10  **********
BEG_INT_TE  C   80  80                                                                         *********
BEG_INT__1  C   80  80                                                                          ********
BEGIN_ROUT  C   80  80                                                                           *******
BEGIN_LATI  C   80  80                                                                            ******
BEGIN_LONG  C   80  80                                                                             *****
BEGIN_SEGM  C   80  80                                                                              ****
BEGIN_STAT  N   10  10  **********
BEGIN_OFFS  N   10  10  **********
END_TERMIN  C   80  80                                                                             *****
END_TERM_T  C   80  80                                                                               100
END_ROUTE_  C   80  80                                                                                10
END_LATITU  C   80  80                                                                                 1
END_LONGIT  C   80  80                                                                         10006196F
END_SEGMEN  N   10  10  **********
END_STATE_  C   80  80                                                                         10006186F
END_OFFSET  N   10  10  **********
SEGMENT_LE  N   24  24  #######0.200000000000000
UNIMPROVED  C   80  80                                                                            100061
GRAVEL_MIL  C   80  80                                                                             10006
SEAL_COATE  C   80  80                                                                              1000
BITUMINOUS  N   24  24  #######0.000000000000000
BRICK_MILE  C   80  80                                                                         10006144F
CONCRETE_M  C   80  80                                                                          10006135
EST_AVERAG  N   10  10  **********
RIGHT_OF_W  N   10  10  **********
CART_WAY_W  N   10  10  **********
FEDERAL_AI  C   80  80                                                                                 1
DIRECTION_  C   80  80                                                                         10006114F
FEDERAL__1  C   80  80                                                                          10006105
BRIDGE_COU  C   80  80                                                                           1000609
RAIL_ROAD_  C   80  80                                                                            100060
TRAFFIC_CO  C   80  80                                                                             10006
TRAFFIC__1  C   80  80                                                                              1000
FUNCTIONAL  N   10  10  **********
FEDERAL__2  C   80  80                                                                              1000
TRUCK_DAIL  C   80  80                                                                               100
TURNBACK_I  C   80  80  E                                                                           FALS
FIPS_AREA_  C   80  80                                                                                 1
ROAD_OWNER  C   80  80                                                                         100061242
TYPE_OF_AR  C   80  80                                                                          10006115
TRAFFIC_PA  C   80  80                                                                           1000610
STRUCTURAL  C   80  80                                                                            100060
LIQUID_FUE  C   80  80                                                                             10006
POSTED_BON  C   80  80                                                                              1000
NO_OF_LANE  C   80  80                                                                               100
SHOULDER_T  C   80  80                                                                                10
SHOULDER_W  C   80  80                                                                                 1
SPEED_LIMI  C   80  80                                                                                01
PARKING_LA  C   80  80                                                                                **
BIFURCATED  C   80  80                                                                              ****
MEDIAN_TYP  C   80  80                                                                               ***
CURBS       C   80  80                                                                                **
SIDEWALKS   C   80  80                                                                                 *
RTE_TNUM    C   80  80                                                                         *********
ID      N   10  10  ##10369000
LIQUID_F_1  C   80  80                                                                           *******
NSFA_IND    C   80  80                                                                            ******
SURFACE_TY  C   80  80                                                                             *****
ROAD_TYPE   C   80  80                                                                              ****
NOTE        C   89  89                                                                                        ***
LR_RT_NO    C   80  80                                                                                **
DG_CREATE_  C   80  80                                                                              ****
DG_CREAT_1  C   80  80                                                                               ***
DG_UPDATE_  C   80  80                                                                                **
DG_UPDAT_1  C   80  80                                                                                 *
GIS_UPDATE  C   80  80                                                                         *********
yakra commented 6 years ago

20180308, 20180307a, 20180306 info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10  10369000
CTY_CODE    C   80  80                                                                        **********
DISTRICT_N  C   80  80                                                                         *********
MUN_ID      N   10  10  **********
MUNICIPAL_  C   80  80                                                                             *****
MUNICIPA_1  C   80  80                                                                              ****
LR_NAME     C   80  80                                                                               ***
LR_TYPE     C   80  80                                                                           *******
LR_ID       N   10  10  **********
SEGMENT_NU  N   10  10  **********
OFFSET_BGN  N   10  10  **********
OFFSET_END  N   10  10  **********
SEG_LNGTH_  N   10  10  **********
CUM_OFFSET  N   10  10  **********
CUM_OFFS_1  N   10  10  **********
CUM_SEG_LN  N   10  10  **********
BEG_INT_TE  C   80  80                                                                         *********
BEG_INT__1  C   80  80                                                                          ********
BEGIN_ROUT  C   80  80                                                                           *******
BEGIN_LATI  C   80  80                                                                            ******
BEGIN_LONG  C   80  80                                                                             *****
BEGIN_SEGM  C   80  80                                                                              ****
BEGIN_STAT  N   10  10  **********
BEGIN_OFFS  N   10  10  **********
END_TERMIN  C   80  80                                                                             *****
END_TERM_T  C   80  80                                                                               100
END_ROUTE_  C   80  80                                                                                10
END_LATITU  C   80  80                                                                                 1
END_LONGIT  C   80  80                                                                         10006196F
END_SEGMEN  N   10  10  **********
END_STATE_  C   80  80                                                                         10006186F
END_OFFSET  N   10  10  **********
SEGMENT_LE  N   24  24  0.200000000000000
UNIMPROVED  C   80  80                                                                            100061
GRAVEL_MIL  C   80  80                                                                             10006
SEAL_COATE  C   80  80                                                                              1000
BITUMINOUS  N   24  24  0.000000000000000
BRICK_MILE  C   80  80                                                                         10006144F
CONCRETE_M  C   80  80                                                                          10006135
EST_AVERAG  N   10  10  **********
RIGHT_OF_W  N   10  10  **********
CART_WAY_W  N   10  10  **********
FEDERAL_AI  C   80  80                                                                                 1
DIRECTION_  C   80  80                                                                         10006114F
FEDERAL__1  C   80  80                                                                          10006105
BRIDGE_COU  C   80  80                                                                           1000609
RAIL_ROAD_  C   80  80                                                                            100060
TRAFFIC_CO  C   80  80                                                                             10006
TRAFFIC__1  C   80  80                                                                              1000
FUNCTIONAL  N   10  10  **********
FEDERAL__2  C   80  80                                                                              1000
TRUCK_DAIL  C   80  80                                                                               100
TURNBACK_I  C   80  80  E                                                                           FALS
FIPS_AREA_  C   80  80                                                                                 1
ROAD_OWNER  C   80  80                                                                         100061242
TYPE_OF_AR  C   80  80                                                                          10006115
TRAFFIC_PA  C   80  80                                                                           1000610
STRUCTURAL  C   80  80                                                                            100060
LIQUID_FUE  C   80  80                                                                             10006
POSTED_BON  C   80  80                                                                              1000
NO_OF_LANE  C   80  80                                                                               100
SHOULDER_T  C   80  80                                                                                10
SHOULDER_W  C   80  80                                                                                 1
SPEED_LIMI  C   80  80                                                                                01
PARKING_LA  C   80  80                                                                                **
BIFURCATED  C   80  80                                                                              ****
MEDIAN_TYP  C   80  80                                                                               ***
CURBS       C   80  80                                                                                **
SIDEWALKS   C   80  80                                                                                 *
RTE_TNUM    C   80  80                                                                         *********
ID      N   10  10  10369000
LIQUID_F_1  C   80  80                                                                           *******
NSFA_IND    C   80  80                                                                            ******
SURFACE_TY  C   80  80                                                                             *****
ROAD_TYPE   C   80  80                                                                              ****
NOTE        C   89  89                                                                                        ***
LR_RT_NO    C   80  80                                                                                **
DG_CREATE_  C   80  80                                                                              ****
DG_CREAT_1  C   80  80                                                                               ***
DG_UPDATE_  C   80  80                                                                                **
DG_UPDAT_1  C   80  80                                                                                 *
GIS_UPDATE  C   80  80                                                                         *********

20180307a does not have a corresponding .cpp file or GitHub commit. Where exactly in the development history is it? 20180306 has a .cpp file; no GitHub commit.

yakra commented 6 years ago

20180307 info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10  10369000
CTY_CODE    C   80  80                                                                        **********
DISTRICT_N  C   80  80  
MUN_ID      N   10  10  **********
MUNICIPAL_  C   80  80                                                                             *****
MUNICIPA_1  C   80  80  
LR_NAME     C   80  80  
LR_TYPE     C   80  80                                                                           *******
LR_ID       N   10  10  **********
SEGMENT_NU  N   10  10  **********
OFFSET_BGN  N   10  10  **********
OFFSET_END  N   10  10  **********
SEG_LNGTH_  N   10  10  **********
CUM_OFFSET  N   10  10  **********
CUM_OFFS_1  N   10  10  **********
CUM_SEG_LN  N   10  10  **********
BEG_INT_TE  C   80  80                                                                         *********
BEG_INT__1  C   80  80  
BEGIN_ROUT  C   80  80  
BEGIN_LATI  C   80  80  
BEGIN_LONG  C   80  80  
BEGIN_SEGM  C   80  80  
BEGIN_STAT  N   10  10  **********
BEGIN_OFFS  N   10  10  **********
END_TERMIN  C   80  80  
END_TERM_T  C   80  80                                                                               100
END_ROUTE_  C   80  80                                                                                10
END_LATITU  C   80  80                                                                                 1
END_LONGIT  C   80  80  
END_SEGMEN  N   10  10  **********
END_STATE_  C   80  80  
END_OFFSET  N   10  10  **********
SEGMENT_LE  N   24  24  0.200000000000000
UNIMPROVED  C   80  80                                                                            100061
GRAVEL_MIL  C   80  80                                                                             10006
SEAL_COATE  C   80  80                                                                              1000
BITUMINOUS  N   24  24  0.000000000000000
BRICK_MILE  C   80  80  
CONCRETE_M  C   80  80  
EST_AVERAG  N   10  10  **********
RIGHT_OF_W  N   10  10  **********
CART_WAY_W  N   10  10  **********
FEDERAL_AI  C   80  80                                                                                 1
DIRECTION_  C   80  80                                                                         10006114F
FEDERAL__1  C   80  80                                                                          10006105
BRIDGE_COU  C   80  80                                                                           1000609
RAIL_ROAD_  C   80  80  
TRAFFIC_CO  C   80  80                                                                             10006
TRAFFIC__1  C   80  80                                                                              1000
FUNCTIONAL  N   10  10  **********
FEDERAL__2  C   80  80  
TRUCK_DAIL  C   80  80  
TURNBACK_I  C   80  80  E
FIPS_AREA_  C   80  80  
ROAD_OWNER  C   80  80                                                                         100061242
TYPE_OF_AR  C   80  80  
TRAFFIC_PA  C   80  80  
STRUCTURAL  C   80  80                                                                            100060
LIQUID_FUE  C   80  80  
POSTED_BON  C   80  80  
NO_OF_LANE  C   80  80                                                                               100
SHOULDER_T  C   80  80  
SHOULDER_W  C   80  80  
SPEED_LIMI  C   80  80  
PARKING_LA  C   80  80                                                                                **
BIFURCATED  C   80  80                                                                              ****
MEDIAN_TYP  C   80  80  
CURBS       C   80  80                                                                                **
SIDEWALKS   C   80  80                                                                                 *
RTE_TNUM    C   80  80  
ID      N   10  10  10369000
LIQUID_F_1  C   80  80                                                                           *******
NSFA_IND    C   80  80  
SURFACE_TY  C   80  80                                                                             *****
ROAD_TYPE   C   80  80  
NOTE        C   89  89  
LR_RT_NO    C   80  80  
DG_CREATE_  C   80  80                                                                              ****
DG_CREAT_1  C   80  80                                                                               ***
DG_UPDATE_  C   80  80                                                                                **
DG_UPDAT_1  C   80  80                                                                                 *
GIS_UPDATE  C   80  80  
Saving trimmed file...

20180307 does not have a corresponding .cpp file or GitHub commit. Where exactly in the development history is it?

yakra commented 6 years ago

20180306a info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10  10369000
CTY_CODE    C   80  80                                                                        **********
DISTRICT_N  C   80  80  
MUN_ID      N   10  10  **********
MUNICIPAL_  C   80  80                                                                             *****
MUNICIPA_1  C   80  80  
LR_NAME     C   80  80  
LR_TYPE     C   80  80                                                                           *******
LR_ID       N   10  10  **********
SEGMENT_NU  N   10  10  **********
OFFSET_BGN  N   10  10  **********
OFFSET_END  N   10  10  **********
SEG_LNGTH_  N   10  10  **********
CUM_OFFSET  N   10  10  **********
CUM_OFFS_1  N   10  10  **********
CUM_SEG_LN  N   10  10  **********
BEG_INT_TE  C   80  80                                                                         *********
BEG_INT__1  C   80  80  
BEGIN_ROUT  C   80  80  
BEGIN_LATI  C   80  80  
BEGIN_LONG  C   80  80  
BEGIN_SEGM  C   80  80  
BEGIN_STAT  N   10  10  **********
BEGIN_OFFS  N   10  10  **********
END_TERMIN  C   80  80  
END_TERM_T  C   80  80                                                                               100
END_ROUTE_  C   80  80                                                                                10
END_LATITU  C   80  80                                                                                 1
END_LONGIT  C   80  80  
END_SEGMEN  N   10  10  **********
END_STATE_  C   80  80  
END_OFFSET  N   10  10  **********
SEGMENT_LE  N   24  24  0.200000000000000
UNIMPROVED  C   80  80                                                                            100061
GRAVEL_MIL  C   80  80                                                                             10006
SEAL_COATE  C   80  80                                                                              1000
BITUMINOUS  N   24  24  0.000000000000000
BRICK_MILE  C   80  80  
CONCRETE_M  C   80  80  
EST_AVERAG  N   10  10  **********
RIGHT_OF_W  N   10  10  **********
CART_WAY_W  N   10  10  **********
FEDERAL_AI  C   80  80                                                                                 1
DIRECTION_  C   80  80                                                                         10006114F
FEDERAL__1  C   80  80                                                                          10006105
BRIDGE_COU  C   80  80                                                                           1000609
RAIL_ROAD_  C   80  80  
TRAFFIC_CO  C   80  80                                                                             10006
TRAFFIC__1  C   80  80                                                                              1000
FUNCTIONAL  N   10  10  **********
FEDERAL__2  C   80  80  
TRUCK_DAIL  C   80  80  
TURNBACK_I  C   80  80  E
FIPS_AREA_  C   80  80  
ROAD_OWNER  C   80  80                                                                         100061242
TYPE_OF_AR  C   80  80  
TRAFFIC_PA  C   80  80  
STRUCTURAL  C   80  80                                                                            100060
LIQUID_FUE  C   80  80  
POSTED_BON  C   80  80  
NO_OF_LANE  C   80  80                                                                               100
SHOULDER_T  C   80  80  
SHOULDER_W  C   80  80  
SPEED_LIMI  C   80  80  
PARKING_LA  C   80  80                                                                                **
BIFURCATED  C   80  80                                                                              ****
MEDIAN_TYP  C   80  80  
CURBS       C   80  80                                                                                **
SIDEWALKS   C   80  80  
RTE_TNUM    C   80  80  
ID      N   10  10  10369000
LIQUID_F_1  C   80  80                                                                           *******
NSFA_IND    C   80  80  
SURFACE_TY  C   80  80                                                                             *****
ROAD_TYPE   C   80  80  
NOTE        C   89  89  
LR_RT_NO    C   80  80  
DG_CREATE_  C   80  80                                                                              ****
DG_CREAT_1  C   80  80                                                                               ***
DG_UPDATE_  C   80  80                                                                                **
DG_UPDAT_1  C   80  80  
GIS_UPDATE  C   80  80  

20180306a does not have a corresponding .cpp file or GitHub commit. Where exactly in the development history is it?

yakra commented 6 years ago

20180306: same as 20180308 & 20180307a https://github.com/yakra/tmtools/issues/26#issuecomment-372532199 20180306 has a .cpp file; no GitHub commit.

yakra commented 6 years ago

20171106 info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10  ********* 
CTY_CODE    C   80  80                                                                        **********
DISTRICT_N  C   80  80                                                                         *********
MUN_ID      N   10  10  **********
MUNICIPAL_  C   80  80                                                                             *****
MUNICIPA_1  C   80  80                                                                              ****
LR_NAME     C   80  80                                                                               ***
LR_TYPE     C   80  80                                                                           *******
LR_ID       N   10  10  **********
SEGMENT_NU  N   10  10  **********
OFFSET_BGN  N   10  10  **********
OFFSET_END  N   10  10  **********
SEG_LNGTH_  N   10  10  **********
CUM_OFFSET  N   10  10  **********
CUM_OFFS_1  N   10  10  **********
CUM_SEG_LN  N   10  10  **********
BEG_INT_TE  C   80  80                                                                         *********
BEG_INT__1  C   80  80                                                                          ********
BEGIN_ROUT  C   80  80                                                                           *******
BEGIN_LATI  C   80  80                                                                            ******
BEGIN_LONG  C   80  80                                                                             *****
BEGIN_SEGM  C   80  80                                                                              ****
BEGIN_STAT  N   10  10  **********
BEGIN_OFFS  N   10  10  **********
END_TERMIN  C   80  80                                                                             *****
END_TERM_T  C   80  80                                                                               100
END_ROUTE_  C   80  80                                                                                10
END_LATITU  C   80  80                                                                                 1
END_LONGIT  C   80  80                                                                         10006196F
END_SEGMEN  N   10  10  **********
END_STATE_  C   80  80                                                                         10006186F
END_OFFSET  N   10  10  **********
SEGMENT_LE  N   24  24  *********       0.190000
UNIMPROVED  C   80  80                                                                            100061
GRAVEL_MIL  C   80  80                                                                             10006
SEAL_COATE  C   80  80                                                                              1000
BITUMINOUS  N   24  24  ************************
BRICK_MILE  C   80  80                                                                         10006144F
CONCRETE_M  C   80  80                                                                          10006135
EST_AVERAG  N   10  10  **********
RIGHT_OF_W  N   10  10  **********
CART_WAY_W  N   10  10  **********
FEDERAL_AI  C   80  80                                                                                 1
DIRECTION_  C   80  80                                                                         10006114F
FEDERAL__1  C   80  80                                                                          10006105
BRIDGE_COU  C   80  80                                                                           1000609
RAIL_ROAD_  C   80  80                                                                            100060
TRAFFIC_CO  C   80  80                                                                             10006
TRAFFIC__1  C   80  80                                                                              1000
FUNCTIONAL  N   10  10  **********
FEDERAL__2  C   80  80                                                                              1000
TRUCK_DAIL  C   80  80                                                                               100
TURNBACK_I  C   80  80  E                                                                           FALS
FIPS_AREA_  C   80  80                                                                                 1
ROAD_OWNER  C   80  80                                                                         100061242
TYPE_OF_AR  C   80  80                                                                          10006115
TRAFFIC_PA  C   80  80                                                                           1000610
STRUCTURAL  C   80  80                                                                            100060
LIQUID_FUE  C   80  80                                                                             10006
POSTED_BON  C   80  80                                                                              1000
NO_OF_LANE  C   80  80                                                                               100
SHOULDER_T  C   80  80                                                                                10
SHOULDER_W  C   80  80                                                                                 1
SPEED_LIMI  C   80  80                                                                                01
PARKING_LA  C   80  80                                                                                **
BIFURCATED  C   80  80                                                                              ****
MEDIAN_TYP  C   80  80                                                                               ***
CURBS       C   80  80                                                                                **
SIDEWALKS   C   80  80                                                                                 *
RTE_TNUM    C   80  80                                                                         *********
ID      N   10  10  **********
LIQUID_F_1  C   80  80                                                                           *******
NSFA_IND    C   80  80                                                                            ******
SURFACE_TY  C   80  80                                                                             *****
ROAD_TYPE   C   80  80                                                                              ****
NOTE        C   89  89                                                                                        ***
LR_RT_NO    C   80  80                                                                                **
DG_CREATE_  C   80  80                                                                              ****
DG_CREAT_1  C   80  80                                                                               ***
DG_UPDATE_  C   80  80                                                                                **
DG_UPDAT_1  C   80  80                                                                                 *
GIS_UPDATE  C   80  80                                                                         *********
yakra commented 6 years ago

20171105 info display:

/home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.dbf opened.
DBF Filesize:   3106786073 (sanity check borderline; may be missing terminal 0x1A)
Number Records: 0x93e97 605847
Header Length:  0xa61   2657
Record Length:  0x1408  5128
First char: 0x3 3
Final char: 0x20    32
82 fields.
Scanning DBF file...
605847/605847
FieldName   Type    Length  Max Data
LRS_ID      N   10  10    <Type N fields unsupported>
CTY_CODE    C   80  80                                                                        **********
DISTRICT_N  C   80  80                                                                         *********
MUN_ID      N   10  10    <Type N fields unsupported>
MUNICIPAL_  C   80  80                                                                             *****
MUNICIPA_1  C   80  80                                                                              ****
LR_NAME     C   80  80                                                                               ***
LR_TYPE     C   80  80                                                                           *******
LR_ID       N   10  10    <Type N fields unsupported>
SEGMENT_NU  N   10  10    <Type N fields unsupported>
OFFSET_BGN  N   10  10    <Type N fields unsupported>
OFFSET_END  N   10  10    <Type N fields unsupported>
SEG_LNGTH_  N   10  10    <Type N fields unsupported>
CUM_OFFSET  N   10  10    <Type N fields unsupported>
CUM_OFFS_1  N   10  10    <Type N fields unsupported>
CUM_SEG_LN  N   10  10    <Type N fields unsupported>
BEG_INT_TE  C   80  80                                                                         *********
BEG_INT__1  C   80  80                                                                          ********
BEGIN_ROUT  C   80  80                                                                           *******
BEGIN_LATI  C   80  80                                                                            ******
BEGIN_LONG  C   80  80                                                                             *****
BEGIN_SEGM  C   80  80                                                                              ****
BEGIN_STAT  N   10  10    <Type N fields unsupported>
BEGIN_OFFS  N   10  10    <Type N fields unsupported>
END_TERMIN  C   80  80                                                                             *****
END_TERM_T  C   80  80                                                                               100
END_ROUTE_  C   80  80                                                                                10
END_LATITU  C   80  80                                                                                 1
END_LONGIT  C   80  80                                                                         10006196F
END_SEGMEN  N   10  10    <Type N fields unsupported>
END_STATE_  C   80  80                                                                         10006186F
END_OFFSET  N   10  10    <Type N fields unsupported>
SEGMENT_LE  N   24  24    <Type N fields unsupported>
UNIMPROVED  C   80  80                                                                            100061
GRAVEL_MIL  C   80  80                                                                             10006
SEAL_COATE  C   80  80                                                                              1000
BITUMINOUS  N   24  24    <Type N fields unsupported>
BRICK_MILE  C   80  80                                                                         10006144F
CONCRETE_M  C   80  80                                                                          10006135
EST_AVERAG  N   10  10    <Type N fields unsupported>
RIGHT_OF_W  N   10  10    <Type N fields unsupported>
CART_WAY_W  N   10  10    <Type N fields unsupported>
FEDERAL_AI  C   80  80                                                                                 1
DIRECTION_  C   80  80                                                                         10006114F
FEDERAL__1  C   80  80                                                                          10006105
BRIDGE_COU  C   80  80                                                                           1000609
RAIL_ROAD_  C   80  80                                                                            100060
TRAFFIC_CO  C   80  80                                                                             10006
TRAFFIC__1  C   80  80                                                                              1000
FUNCTIONAL  N   10  10    <Type N fields unsupported>
FEDERAL__2  C   80  80                                                                              1000
TRUCK_DAIL  C   80  80                                                                               100
TURNBACK_I  C   80  80  E                                                                           FALS
FIPS_AREA_  C   80  80                                                                                 1
ROAD_OWNER  C   80  80                                                                         100061242
TYPE_OF_AR  C   80  80                                                                          10006115
TRAFFIC_PA  C   80  80                                                                           1000610
STRUCTURAL  C   80  80                                                                            100060
LIQUID_FUE  C   80  80                                                                             10006
POSTED_BON  C   80  80                                                                              1000
NO_OF_LANE  C   80  80                                                                               100
SHOULDER_T  C   80  80                                                                                10
SHOULDER_W  C   80  80                                                                                 1
SPEED_LIMI  C   80  80                                                                                01
PARKING_LA  C   80  80                                                                                **
BIFURCATED  C   80  80                                                                              ****
MEDIAN_TYP  C   80  80                                                                               ***
CURBS       C   80  80                                                                                **
SIDEWALKS   C   80  80                                                                                 *
RTE_TNUM    C   80  80                                                                         *********
ID      N   10  10    <Type N fields unsupported>
LIQUID_F_1  C   80  80                                                                           *******
NSFA_IND    C   80  80                                                                            ******
SURFACE_TY  C   80  80                                                                             *****
ROAD_TYPE   C   80  80                                                                              ****
NOTE        C   89  89                                                                                        ***
LR_RT_NO    C   80  80                                                                                **
DG_CREATE_  C   80  80                                                                              ****
DG_CREAT_1  C   80  80                                                                               ***
DG_UPDATE_  C   80  80                                                                                **
DG_UPDAT_1  C   80  80                                                                                 *
GIS_UPDATE  C   80  80                                                                         *********

20171026 is the same thing, except without the

Scanning DBF file...
605847/605847
yakra commented 6 years ago

The lack of results & funky Info Display are not due to a problem with DBFtrim, but are the result of garbage data in the DBF file itself.

I'm going to dig a bit deeper...

yakra commented 6 years ago

Record 1: The easternmost segment of Race Track Rd in Berwick, as reported by QGIS:

LRS_ID: 1
CTY_CODE: 01
DISTRICT_N: 08
MUN_ID: 3
MUNICIPAL_: 01201
MUNICIPA_1: BERWICK
LR_NAME: RACE TRACK
LR_TYPE: RD
LR_ID: 6
SEGMENT_NU: 30
OFFSET_BGN: 9029
OFFSET_END: 11880
SEG_LNGTH_: 2851
CUM_OFFSET: 9029
CUM_OFFS_1: 11880
CUM_SEG_LN: 11880
BEG_INT_TE: KINEMAN
BEG_INT__1: RD
BEGIN_ROUT: 506
BEGIN_LATI: 0.0000
BEGIN_LONG: 0.0000
BEGIN_SEGM: 0
BEGIN_STAT: 0
BEGIN_OFFS: 0
END_TERMIN: SR 0194
END_TERM_T: SR
END_ROUTE_: NULL
END_LATITU: 0.0000
END_LONGIT: 0.0000
END_SEGMEN: 0
END_STATE_: 0
END_OFFSET: 0
SEGMENT_LE: 0.540000000000000
UNIMPROVED: NULL
GRAVEL_MIL: NULL
SEAL_COATE: NULL
BITUMINOUS: 0.540000000000000
BRICK_MILE: NULL
CONCRETE_M: NULL
EST_AVERAG: 200
RIGHT_OF_W: 33
CART_WAY_W: 23
FEDERAL_AI: 3
DIRECTION_: 2
FEDERAL__1: 0
BRIDGE_COU: NULL
RAIL_ROAD_: NULL
TRAFFIC_CO: 2                                  LRS_ID
TRAFFIC__1: NULL
FUNCTIONAL: NULL
FEDERAL__2: AL_
TRUCK_DAIL: NULL
TURNBACK_I: _NU
FIPS_AREA_: NULL
ROAD_OWNER: S_1
TYPE_OF_AR: NULL
TRAFFIC_PA: ATI
STRUCTURAL: NULL
LIQUID_FUE: MIN
POSTED_BON: NULL
NO_OF_LANE: MEN
SHOULDER_T: NULL
SHOULDER_W: MIL
SPEED_LIMI: NULL
PARKING_LA: RAG
BIFURCATED: NULL
MEDIAN_TYP: __1
CURBS: NULL
SIDEWALKS: NAL
RTE_TNUM: NULL
ID: 0
LIQUID_F_1: NULL
NSFA_IND: TRUCTURAL
SURFACE_TY: NULL
ROAD_TYPE: HOULDER_W
NOTE: NULL
LR_RT_NO: URBS
DG_CREATE_: NULL
DG_CREAT_1: SFA_IND
DG_UPDATE_: NULL
DG_UPDAT_1: G_CREATE_
GIS_UPDATE: NULL

It's easier to see in a hex editor, but what we see here is, from bytes 5151 to 7216, and again from 7305 to 7785, the field data is a copy of the field descriptor array. The null 0s cause QGIS to truncate these data to null or partial values. The bad data appears to continue into Record 2, at least as far as byte 10253: still a copy of record 1 from earlier in the file. I'm going to write a program to measure the length of copied data, and see if there's a constant offset.

yakra commented 6 years ago

What gets copied...

32 - 2097 copied to 5151 - 7216 (5119 bytes ahead) Garbage begins partway thru TRAFFIC_CO in Record 1. TRAFFIC__1 thru ROAD_TYPE are complete garbage. All a copy of the beginning of the field descriptor array.

7216 - 7305 OK These 89 bytes correspond to NOTE in Record 1.

2177 - 4693 copied to 7305 - 9821 (5128 bytes ahead) 5128 = 1 record length. LR_RT_NO thru GIS_UPDATE are garbage in Record 1, a copy of a later segment of the field descriptor array. Record 2 begins as a copy of Record 1, offset ahead by one record length. Garbage continues until 3 B short of the end of EST_AVERAG.

Worth noting? The 2nd copied block begins 89 B (length of NOTE) after the 1st ends. The 2nd source block begins 80 B (common Type C field length) after the 1st ends. It's almost like NOTE was blanked out, then lengthened by 9 bytes. (Needed more space?) That could help explain why the offset of each copied block is 9 B > the previous...

(4526 <-> 4684) - 5151 copied to (9663 <-> 9821) - 10288 (5137 bytes ahead) I first noticed data getting copied at byte 9830 (beginnging of actual data, after whitespace, in RIGHT_OF_W), which gets a copy of data from Record 1. However, I suspect that if the data were not corrupted, EST_AVERAG might have a value of 200, which is not there. For it to be overwritten by whitespace, the copy would have to start at least 9 B earlier, @ 9821. Garbage ends midway thru TRAFFIC_CO; seems to be regular data thereafter.

yakra commented 6 years ago

http://www.dot7.state.pa.us/BPR_pdf_files/Maps/Type5//01201.pdf

yakra commented 6 years ago
  • Make a quick-n-dirty program to see if there are any other scraps copied from the field descriptor array farther ahead in the file.

Crap. There's more. Up in the 400,000s...

yakra commented 6 years ago

Oh good. Records 425391 thru 426138 (numbering from 0) all contain data from the field descriptor array. Can't view in GHex because it starts misbehaving @ offsets > 2 GB. Okteta sez, Support to load files larger than 2 GiB has not yet been implemented. This means writing a utility to query a record number & output its field info.

  • I'm confident I can fill in the missing data in Records 1 & 2. Probably more easily done by hand in a hex editor than by writing a program to patch the file. Unless...

I've reconstructed the data for the first two fields. I can still edit that in. For the other 749 records? Fuhgeddaboutit! That'll take a program to blank that stuff out.

yakra commented 6 years ago

This means writing a utility to query a record number & output its field info.

Done.

Records 425391 thru 426138 (numbering from 0) all contain data from the field descriptor array.

The 19-ish records I checked appear to be entirely Zeros. Having 10 zeros in one of the type N fields is enough to match part of the field descriptor array. This means this should be trimmable, and I won't have to write a program to blank it out. So now, I just need to fix the first 2 records manually.

yakra commented 6 years ago
  • If I can successfully patch the data and trim the file, what will be the MaxVal for NOTE?

HA! It's still

***                                                                                ******

There's still garbage data. Info Display is identical to when Records 1 & 2 were hosed, as pasted upthread. https://github.com/yakra/tmtools/issues/26#issuecomment-372530617

This means...

yakra commented 6 years ago

There's still garbage data.

Or rather, just non-standard / poorly-formatted data?

  • Another new program to identify & log which records have fields where vlen == oDBF.fArr[i].len

Record 418263 was among those IDed as having 19 fields where the data took up the full field width. These 19 fields were all type N, containing all asterisks.

Better than wiping records, I can target these fields and wipe them to all spaces.

yakra commented 6 years ago

target these fields and wipe them to all spaces

I've identified > 4.2 million fields filled with entirely one character. Based on the first 2^20 examples in the log file that LibreOffice can read, the majority are filled with asterisks. Some are filled with '0':

record  field       int char
----------------------------------------
419790  END_OFFSET  48  ('0')
419800  END_SEGMEN  48  ('0')
420308  EST_AVERAG  48  ('0')
420385  BEGIN_OFFS  48  ('0')
420386  BEGIN_STAT  48  ('0')
420898  END_OFFSET  48  ('0')
421578  CUM_SEG_LN  48  ('0')
423476  ID      48  ('0')
424485  LRS_ID      48  ('0')
425153  ID      48  ('0')

ID is Type C; the rest are Type N. Worthwhile for Type N/F fields, to change to a single right-justified '0' rather than wipe completely? Or, 0.0? Any potential for wacky hijinks upon trimming? Mumble grumble...

yakra commented 6 years ago

Any chars other than '' & '0' that LibreOffice can't see? Maybe keep a vector of CharsFound*...

yakra commented 6 years ago

Worthwhile for Type N/F fields, to change to a single right-justified '0' rather than wipe completely? Or, 0.0? Any potential for wacky hijinks upon trimming? Mumble grumble...

Any solution requires potentially coping with both ints & decimal values.

28 must be resolved before I can proceed.

yakra commented 6 years ago

Any chars other than '' & '0' that LibreOffice can't see? Maybe keep a vector of CharsFound*...

Only '*' & '0' are used. Also made a vector of fields affected: everything but DG_CREAT_1 and NOTE

yakra commented 6 years ago

DBFwipe

on /home/yakra/gis/data/pa/Pennsylvania_Local_Roads/Pennsylvania_Local_Roads.yPatch2.dbf

Wiped fields filled entirely with one character. Fields with all '0's get a single right-justified '0'. Including type C fields. Not the end of the world.

Some fields are still completely filled with some character combination: 6 records have 7 fields filled. 9 records have 6 fields filled. 18 records have 5 fields filled. 34 records have 4 fields filled. 78 records have 3 fields filled. 185 records have 2 fields filled. 663 records have 1 field filled. 993 records in total.

yakra commented 6 years ago

There's still garbage data.

Or rather, just non-standard / poorly-formatted data?

Garbage data. Checking out records with 4+ fields filled, all of them show evidence of what should be a single datum split between multiple fields, misaligned data, etc., much like the bugs that existed in the first 2 records. 3 fields: checked every 2nd record; data similarly b0rked. 2 fields: checked every 5th record; data similarly b0rked.

1 field: planned on checking every 20th record. The first up are thus 4451 & 10096. Apart from some numeric fields being padded with '*'s in addition to ' 's, this looks like good data. Nothing misaligned. 418916 thru 425286, examining every 20th record, look like the same old garbage. 484619 has perfectly good data, notable only for the NOTE field using all 89 allotted bytes.

Will dig deeper...

Another idea: A program to identify type N/F fields containing characters other than "+-1234567890.eE\0 "

yakra commented 6 years ago

1 field

968 - 12426: OK, albeit with '*' padding in numeric fields. 418782 - 418884: Solid Garbage. 418916 - 425286: Every 20th record examined: Garbage. 425287 - 425386: Solid Garbage. 426139 - 596824: OK. NOTE field filled.

2 fields: 418863 - 425327 is within garbage range of single-field records. Begin/End within solid ranges. 3 fields: 418854 - 425175 is within garbage range of single-field records. Begin within solid range. 4 fields: 419216 - 425075 is within garbage range of single-field records. 5 fields: 419704 - 425347 is within garbage range of single-field records. End within solid range. 6 fields: 419794 - 425174 is within garbage range of single-field records. 7 fields: 419495 - 422701 is within garbage range of single-field records.

Finding full Garbage range...

418771 thru 418775: OK 418776 thru 418780: Garbage 425381 thru 425390: Garbage 425391 thru 426138, Null zeros. (Only multiples of 100 or 10 checked, except @ ends of range)

Full Garbage range:

418776 thru 425390. That's 6615 records. 1.09% of the total.

yakra commented 6 years ago

The 2 Gig mark...

2^31 - 1 = 2147483647. Byte # 2147483647 is in record 418775. • Garbage begins @ record 418776. That's just too much of a coincidence to be a coincidence. • OTOH, the data doesn't become mangled instantly. Byte 2147483647 is the 2nd byte of the TURNBACK_I field, and data in the remainder of the record looks fine. Also, data looks fine later in the file. • Shrug!

yakra commented 6 years ago

ToDo

yakra commented 6 years ago

DBFzap

  • Target specific records & wipe them clean, all spaces.

Records 418776 thru 425390 wiped clean. For the first time, I have a trimmable file, yPatch3. (3.1 GB -> 530.7 MB, with still plenty left to sort out.)

Info Display:

FieldName   Type    Length  Max Data
LRS_ID      N   10  8   10000001
CTY_CODE    C   80  11  0         2
DISTRICT_N  C   80  11  1         1
MUN_ID      N   10  4   1521
MUNICIPAL_  C   80  14  0         2108
MUNICIPA_1  C   80  24  NEW PHILADELPHIA BOROUGH
LR_NAME     C   80  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
LR_TYPE     C   80  11  D         R
LR_ID       N   10  8   10341182
SEGMENT_NU  N   10  6   147196
OFFSET_BGN  N   10  5   13728
OFFSET_END  N   10  5   11880
SEG_LNGTH_  N   10  5   11141
CUM_OFFSET  N   10  5   13728
CUM_OFFS_1  N   10  5   11880
CUM_SEG_LN  N   10  5   11880
BEG_INT_TE  C   80  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
BEG_INT__1  C   80  11  M         L
BEGIN_ROUT  C   80  3   506
BEGIN_LATI  C   80  15  0.         0000
BEGIN_LONG  C   80  15  0         .0000
BEGIN_SEGM  C   80  4   4014
BEGIN_STAT  N   10  8   ********
BEGIN_OFFS  N   10  10  ******** 0
END_TERMIN  C   80  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
END_TERM_T  C   80  11  R         D
END_ROUTE_  C   80  12  7         60
END_LATITU  C   80  15  0         .0000
END_LONGIT  C   80  15  0.000         0
END_SEGMEN  N   10  4   ****
END_STATE_  C   80  9   *********
END_OFFSET  N   10  4   1923
SEGMENT_LE  N   24  24  ##########0.760          <- 0.760         000
UNIMPROVED  C   80  19  0.07000000000000001
GRAVEL_MIL  C   80  19  0.07000000000000001
SEAL_COATE  C   80  19  0.07000000000000001
BITUMINOUS  N   24  24  *********         ******
BRICK_MILE  C   80  13  *********0.09
CONCRETE_M  C   80  4   0.05
EST_AVERAG  N   10  9   *********
RIGHT_OF_W  N   10  10  ******* 50
CART_WAY_W  N   10  10  *       24
FEDERAL_AI  C   80  10  *********3
DIRECTION_  C   80  1   2
FEDERAL__1  C   80  1   0
BRIDGE_COU  C   80  2   14
RAIL_ROAD_  C   80  2   10
TRAFFIC_CO  C   80  1   2
TRAFFIC__1  C   80  13  201         5
FUNCTIONAL  N   10  1   7
FEDERAL__2  C   80  4   0050
TRUCK_DAIL  C   80  3   100
TURNBACK_I  C   80  5   36784
FIPS_AREA_  C   80  14  69         697
ROAD_OWNER  C   80  3   TWP
TYPE_OF_AR  C   80  2   -1
TRAFFIC_PA  C   80  1   0
STRUCTURAL  C   80  1   4
LIQUID_FUE  C   80  0   
POSTED_BON  C   80  1   N
NO_OF_LANE  C   80  1   0
SHOULDER_T  C   80  1   N
SHOULDER_W  C   80  1   0
SPEED_LIMI  C   80  2   25
PARKING_LA  C   80  2   18
BIFURCATED  C   80  0   
MEDIAN_TYP  C   80  2   UP
CURBS       C   80  1   N
SIDEWALKS   C   80  1   N
RTE_TNUM    C   80  3   505
ID      N   10  8   10000001
LIQUID_F_1  C   80  13  TR         UE
NSFA_IND    C   80  14  F         ALSE
SURFACE_TY  C   80  2   40
ROAD_TYPE   C   80  1   R
NOTE        C   89  89  PAP-CHERRY SPRING, DENTON HILL, LYMAN RUN, OLEBULL, PATTERSON, PROUDY PLACE, SIZERVILE SP
LR_RT_NO    C   80  13  AAT         9
DG_CREATE_  C   80  24  2013-09-09T00:00:00.000Z
DG_CREAT_1  C   80  25  PDALLEGXPLORE1/LOCALROADS
DG_UPDATE_  C   80  33  2015-0         8-26T00:00:00.000Z
DG_UPDAT_1  C   80  25  PDJOHNSXPLORE1/LOCALROADS
GIS_UPDATE  C   80  33  2017-04-07         T09:31:35.000Z

Note how in 20 of these fields, what would otherwise be a single datum is split up by exactly 9 spaces. Most peculiar. There's that 9 again... Note the 0.760 000 in SEGMENT_LE, which plays the GIGO game with the zero-trimmer.

yakra commented 6 years ago

25 fields are so affected, across 144 records. BEG_INT_TE, DG_UPDAT_1, END_TERMIN, LR_NAME, MUNICIPA_1 can't be seen via the Info Display above. 144 instances total; only one field per record.

ToDo

  • Identify type N/F fields containing characters other than "+-1234567890.eE\0 "

This can wait...

yakra commented 6 years ago
  • Query 1 record after, make sure nothing is b0rked; look up in QGIS

Local road names, and intersecting road names @ segment ends appear to be Just Plain Wrong a lot of the time. But whatever, I'll leave it be. There's really not much I can do for that, and my only real objective here has been to trim the file anyway. Record 9861, I cannot see in QGIS. Kinda like AR915... Maybe it's a zero-pointer; follow up in SHP file...

Aside from that, everything looks OK, for data that's just plain janky...

yakra commented 6 years ago

pa9

  • Fix value within the char array & write to pa9.csv, before attempting to save to disk

Done. Saved to disk. Looking good. Trimming yPatch4: 3.1 GB -> 453.2 MB. Vice yPatch3, that's 128 bytes/record smaller.

FieldName   Type    Length  Max Data
LRS_ID      N   10  8   10000001
CTY_CODE    C   80  7   1008903
DISTRICT_N  C   80  2   08
MUN_ID      N   10  4   1521
MUNICIPAL_  C   80  9   158839210
MUNICIPA_1  C   80  24  NEW PHILADELPHIA BOROUGH
LR_NAME     C   80  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
LR_TYPE     C   80  2   RD
LR_ID       N   10  8   10341182
SEGMENT_NU  N   10  6   147196
OFFSET_BGN  N   10  5   13728
OFFSET_END  N   10  5   11880
SEG_LNGTH_  N   10  5   11141
CUM_OFFSET  N   10  5   13728
CUM_OFFS_1  N   10  5   11880
CUM_SEG_LN  N   10  5   11880
BEG_INT_TE  C   80  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
BEG_INT__1  C   80  2   RD
BEGIN_ROUT  C   80  3   506
BEGIN_LATI  C   80  11  400149.7000
BEGIN_LONG  C   80  11  750713.7000
BEGIN_SEGM  C   80  4   4014
BEGIN_STAT  N   10  8   ********
BEGIN_OFFS  N   10  10  ******** 0
END_TERMIN  C   80  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
END_TERM_T  C   80  2   SR
END_ROUTE_  C   80  3   506
END_LATITU  C   80  11  400152.5000
END_LONGIT  C   80  11  750713.1000
END_SEGMEN  N   10  4   ****
END_STATE_  C   80  9   *********
END_OFFSET  N   10  4   1923
SEGMENT_LE  N   24  24  *********     0000000000
UNIMPROVED  C   80  19  0.07000000000000001
GRAVEL_MIL  C   80  19  0.07000000000000001
SEAL_COATE  C   80  19  0.07000000000000001
BITUMINOUS  N   24  24  ####################0.05 <- 0.050000000000000
BRICK_MILE  C   80  13  *********0.09
CONCRETE_M  C   80  4   0.05
EST_AVERAG  N   10  9   *********
RIGHT_OF_W  N   10  10  ******* 50
CART_WAY_W  N   10  10  *       24
FEDERAL_AI  C   80  10  *********3
DIRECTION_  C   80  1   2
FEDERAL__1  C   80  1   0
BRIDGE_COU  C   80  2   14
RAIL_ROAD_  C   80  2   10
TRAFFIC_CO  C   80  1   2
TRAFFIC__1  C   80  4   2012
FUNCTIONAL  N   10  1   7
FEDERAL__2  C   80  4   0050
TRUCK_DAIL  C   80  3   100
TURNBACK_I  C   80  5   36784
FIPS_AREA_  C   80  5   36784
ROAD_OWNER  C   80  3   TWP
TYPE_OF_AR  C   80  2   -1
TRAFFIC_PA  C   80  1   0
STRUCTURAL  C   80  1   4
LIQUID_FUE  C   80  0   
POSTED_BON  C   80  1   N
NO_OF_LANE  C   80  1   0
SHOULDER_T  C   80  1   N
SHOULDER_W  C   80  1   0
SPEED_LIMI  C   80  2   25
PARKING_LA  C   80  2   18
BIFURCATED  C   80  0   
MEDIAN_TYP  C   80  2   UP
CURBS       C   80  1   N
SIDEWALKS   C   80  1   N
RTE_TNUM    C   80  3   505
ID      N   10  8   10000001
LIQUID_F_1  C   80  9   11064TRUE
NSFA_IND    C   80  5   FALSE
SURFACE_TY  C   80  2   40
ROAD_TYPE   C   80  1   R
NOTE        C   89  89  PAP-CHERRY SPRING, DENTON HILL, LYMAN RUN, OLEBULL, PATTERSON, PROUDY PLACE, SIZERVILE SP
LR_RT_NO    C   80  4   AA0K
DG_CREATE_  C   80  24  2013-09-09T00:00:00.000Z
DG_CREAT_1  C   80  25  PDALLEGXPLORE1/LOCALROADS
DG_UPDATE_  C   80  24  2015-08-26T00:00:00.000Z
DG_UPDAT_1  C   80  25  PDJOHNSXPLORE1/LOCALROADS
GIS_UPDATE  C   80  24  2016-07-06T11:23:01.000Z
yakra commented 6 years ago

Note the asterisks in numeric fields the Info Display above. At least BEGIN_STAT, BEGIN_OFFS, END_SEGMEN, SEGMENT_LE, EST_AVERAG, RIGHT_OF_W, CART_WAY_W are affected. This is probably also why BITUMINOUS needs so many integer digits. (Confirm via DBFmine.)

Next up, identify:

Also note the three Character fields with asterisks as well. It's not appropriate to trim them carte blanche as in a Numeric field, but there's still some potential for savings; it's worth looking into. Have a look in GHex after the next trim.

yakra commented 6 years ago

DBFmine BITUMINOUS

ToDo:

yakra commented 6 years ago

BadNumeric

Check out 140000000000000 datum.

Replaced with 0.07, from SEGMENT_LE.

Identify & scrub type N fields containing characters other than "-1234567890.\0 ".

Done. yPatch5: 453.2 -> 432.0 MB. Vice yPatch4, that's 35 bytes/record smaller.

FieldName   Type    Length  Max Data
LRS_ID      N   8   8   10000001
CTY_CODE    C   7   7   1008903
DISTRICT_N  C   2   2   08
MUN_ID      N   4   4   1521
MUNICIPAL_  C   9   9   158839210
MUNICIPA_1  C   24  24  NEW PHILADELPHIA BOROUGH
LR_NAME     C   34  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
LR_TYPE     C   2   2   RD
LR_ID       N   8   8   10341182
SEGMENT_NU  N   6   6   147196
OFFSET_BGN  N   5   5   13728
OFFSET_END  N   5   5   11880
SEG_LNGTH_  N   5   5   11141
CUM_OFFSET  N   5   5   13728
CUM_OFFS_1  N   5   5   11880
CUM_SEG_LN  N   5   5   11880
BEG_INT_TE  C   34  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
BEG_INT__1  C   2   2   RD
BEGIN_ROUT  C   3   3   506
BEGIN_LATI  C   11  11  400149.7000
BEGIN_LONG  C   11  11  750713.7000
BEGIN_SEGM  C   4   4   4014
BEGIN_STAT  N   8   4   2038
BEGIN_OFFS  N   10  4   1923
END_TERMIN  C   34  34  SLIPPERY ROCK ENVIRONMENTAL CENTER
END_TERM_T  C   2   2   SR
END_ROUTE_  C   3   3   506
END_LATITU  C   11  11  400152.5000
END_LONGIT  C   11  11  750713.1000
END_SEGMEN  N   4   3   260
END_STATE_  C   9   9   *********
END_OFFSET  N   4   4   1923
SEGMENT_LE  N   24  24  ######12.609999999999999 <- 12.609999999999999
UNIMPROVED  C   19  19  0.07000000000000001
GRAVEL_MIL  C   19  19  0.07000000000000001
SEAL_COATE  C   19  19  0.07000000000000001
BITUMINOUS  N   24  16  ###########0.067 <- 0.067000000000000
BRICK_MILE  C   13  13  *********0.09
CONCRETE_M  C   4   4   0.05
EST_AVERAG  N   9   5   10600
RIGHT_OF_W  N   10  5   15000
CART_WAY_W  N   10  3   524
FEDERAL_AI  C   10  10  *********3
DIRECTION_  C   1   1   2
FEDERAL__1  C   1   1   0
BRIDGE_COU  C   2   2   14
RAIL_ROAD_  C   2   2   10
TRAFFIC_CO  C   1   1   2
TRAFFIC__1  C   4   4   2012
FUNCTIONAL  N   1   1   7
FEDERAL__2  C   4   4   0050
TRUCK_DAIL  C   3   3   100
TURNBACK_I  C   5   5   36784
FIPS_AREA_  C   5   5   36784
ROAD_OWNER  C   3   3   TWP
TYPE_OF_AR  C   2   2   -1
TRAFFIC_PA  C   1   1   0
STRUCTURAL  C   1   1   4
LIQUID_FUE  C   0   0   
POSTED_BON  C   1   1   N
NO_OF_LANE  C   1   1   0
SHOULDER_T  C   1   1   N
SHOULDER_W  C   1   1   0
SPEED_LIMI  C   2   2   25
PARKING_LA  C   2   2   18
BIFURCATED  C   0   0   
MEDIAN_TYP  C   2   2   UP
CURBS       C   1   1   N
SIDEWALKS   C   1   1   N
RTE_TNUM    C   3   3   505
ID      N   8   8   10000001
LIQUID_F_1  C   9   9   11064TRUE
NSFA_IND    C   5   5   FALSE
SURFACE_TY  C   2   2   40
ROAD_TYPE   C   1   1   R
NOTE        C   89  89  PAP-CHERRY SPRING, DENTON HILL, LYMAN RUN, OLEBULL, PATTERSON, PROUDY PLACE, SIZERVILE SP
LR_RT_NO    C   4   4   AA0K
DG_CREATE_  C   24  24  2013-09-09T00:00:00.000Z
DG_CREAT_1  C   25  25  PDALLEGXPLORE1/LOCALROADS
DG_UPDATE_  C   24  24  2015-08-26T00:00:00.000Z
DG_UPDAT_1  C   25  25  PDJOHNSXPLORE1/LOCALROADS
GIS_UPDATE  C   24  24  2016-07-06T11:23:01.000Z

ToDo:

yakra commented 6 years ago

PApatchC

See how many type C fields contain ''. This only affects MaxVal* for END_STATE_, BRICK_MILE, and FEDERAL_AI, so that's all I'll focus on.

Replaced '*' with ' ' in these fields. yPatch6: 432.0 -> 422.3 MB. Vice yPatch5, that's 16 bytes/record smaller.

Found some numeric values with seven intra-datum spaces. The thing to do is strchr(fVal, ' ') after passing fVal through fTrim.

559 numeric values with spaces within them. 549 of them are in SEGMENT_LE, with an extra leading '0' before whitespace. • Due diligence sanity check: mine the records' corresponding SEG_LNGTH_ (try CUM_SEG_LN too) values and divide by 5280.

The remaining 10:

Record  Field       Datum
1837    SEGMENT_NU  13 20   <-- Change to 20
2978    OFFSET_BGN  1 0 <-- Change to 0
2979    OFFSET_BGN  5 0 <-- Change to 0
2982    SEG_LNGTH_  1 422   <-- Change to 422
6391    SEGMENT_NU  3  30   <-- Change to 30
9816    CUM_OFFSET  26 0    <-- Mark it a 0. GIGO.
9817    CUM_OFFSET  10 0    <-- Mark it a 0. GIGO.
12084   SEGMENT_NU  63 20   <-- Change to 20
13225   OFFSET_BGN  2 0 <-- Change to 0
13229   SEG_LNGTH_  3 898   <-- Change to 898
yakra commented 6 years ago

SEGMENT_LE vs SEG_LNGTH_ & CUM_SEG_LN

559 numeric values with spaces within them. 549 of them are in SEGMENT_LE, with an extra leading '0' before whitespace. • Due diligence sanity check: mine the records' corresponding SEGLNGTH (try CUM_SEG_LN too) values and divide by 5280.

For 538 values , if I trim out the leading '0' from SEGMENT_LE & round to 3 decimal places, then divide SEG_LNGTH by 5280 & round to 3 decimal places, the data match.

Of the remaining 11, 10 have only zero values after the whitespace. Record 3216 has 0.1. The 0.18 calculated via both SEG_LNGTH_ & CUM_SEG_LN match.

Of the remaining 10, 2 more have matching data in SEG_LNGTH_ & CUM_SEG_LN. These check out based on data in the OFFSET fields.

The remaining 8 have nonzero OFFSET_BGN values, so the value in SEG_LNGTH_ makes sense.

ToDo:

yakra commented 6 years ago

yPatch7 -> yTrimV7: 422.3 -> 416.2 MB. Vice yPatch6, that's 10 bytes/record smaller. SEGMENT_LE: 24 -> 14 bytes

yakra commented 6 years ago

Almost there... The remaining ToDo list items are just optimizations to make to DBFtrim itself. In the meantime, I'll just leave this here: a list of other programs I've written over the past 9 days to help in getting all this sorted out...

pennsy
ScanMax
DBFquery
DBFwipe
DBFzap
pa9
BadNumeric
PApatchC
NumSpace
SEGMENT_LE-out
SEGMENT_LE-in
yakra commented 6 years ago

yPatch7 -> yTrimV8A: 411.4 MB. Vice yTrimV7, that's 8 bytes/record smaller. SEGMENT_LE: 14 -> 10 bytes BITUMINOUS: 16 -> 12 bytes MaxIntD only used for decimal values, not integers

yPatch7 -> yTrimV8B: 405.3 MB. Vice yTrimV8A, that's 10 bytes/record smaller. SEGMENT_LE: 10 -> 6 bytes BITUMINOUS: 12 -> 6 bytes Trim extraneous leading zeros

30