yakra / tmtools

Tools to aid in development of the TravelMapping project
0 stars 0 forks source link

DBFtrim: left-justified numeric fields #7

Closed yakra closed 6 years ago

yakra commented 6 years ago

txdot-2015-roadways_48113.dbf contains left-justified numeric fields, E.G. OBJECTID. Numeric fields should be trimmed at both left and right ends. Probably Character fields too.

• investigate opportunities to simplify field::GetMax • convergence of Type C & Type N routines • less use of strlen • trimmed space L & trimmed space R (store in reserved bytes? No, that would need to be on a per-record basis) • extra credit -- bash together in hex editor: L, R, & Ctr justified, Type C & Type N

If done right, right-justified fields can be converted to left-justified fields, thus becoming GISplungeable with no changes to GISplunge. <--Nevermind; GISplunge has numeric field support now.

yakra commented 6 years ago

tackle this concurrently with #13

yakra commented 6 years ago

Pennsylvania_Local_Roads.dbf (3.1 GB) exhibits the mirror image of the usual convention. Its... • Type C fields are right-justified. • Type N fields are left-justified. Thus it cannot be trimmed at all, and the output file is identical to the input file.

yakra commented 6 years ago

• investigate opportunities to simplify field::GetMax • convergence of Type C & Type N routines

20 & #21 are steps toward this goal. Still a work in progress.

yakra commented 6 years ago

• less use of strlen

31a573532109ebbc6138e3eb551cf2d4fd4ff33a reduces speed of gathering field info by 49-57% based on tests of MA road inventory & AR centerline file. There's still room for improvement if I lose *PtLoc

yakra commented 6 years ago

~/gis/data/tx/txdot-roads_48113/txdot-2015-roadways_48113.dbf only DIFFs are in OBJECTID, RTE_UNIQ_I, Shape_Leng (Numeric becomes right-justified), and ST_NM (leading space removed from "5TH", "15TH", "2ND", & "LANCASTER HUTCHINS")

yakra commented 6 years ago

Pennsylvania_Local_Roads.dbf (3.1 GB) exhibits the mirror image of the usual convention. Its... • Type C fields are right-justified. • Type N fields are left-justified. Thus it cannot be trimmed at all, and the output file is identical to the input file.

I looked too quickly; the bit about the justification is incorrect. This file cannot be trimmed because the field data isn't being read correctly at all. This isn't new; it shows up in one form or another since my earliest saved DBFtrim executables. By all other indications, the updates to trim whitespace from both L & R have otherwise been successful. I'm going to commit the changes and track the Pennsylvania issue separately in #26.