yakra / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
0 stars 0 forks source link

refactoring compute_stats_r & route_integrity #236

Closed yakra closed 1 year ago

yakra commented 1 year ago

compute_stats_r

is where it is simply because it was split off from compute_stats_t, and related items were left grouped together. Plus, what eventually became RteIntThread didn't exist yet. The datachecks were added in because it was a handy place to put them, in a multi-threaded job iterating thru systems & routes, after .list files were processed. The original plan was to have what became ABBREV_NO_CITY do stuff based on whether a .list name was in use. So, ABBREV_AS_BANNER was added, with the idea that when it eventually came time to implement ABBREV_NO_CITY, I'd just slap an else after the whole shebang and go to town. By the time it was implemented, I'd decided against checking .list names in use because There's no way to automatically know whether the error is due to transposed data as opposed to a missing city or extraneous abbrev. TL;DR, this doesn't have to be where it is, after .list processing. It can be refactored into HighwaySystem::route_integrity. With each thread now doing other stuff the majority of the time, that could mean less competition for the regional mutexes, and a win for parallelism. (OTOH, doing more stuff at once could mean increased cache misses and thus decreased performance.)


route_integrity

is where it is because the original switch to efficient-but-destructive one-time case-smashing on AltLabels required it to happen after NMP processing, in order to preserve case in nearmisspoints.log & nmpfps.log and allow FP entries to match without big changes to nmpfps.log. AltLabels have since been removed from these logs entirely, meaning we can now smash case anywhere. The errorcheck for Routes without a ConnectedRoute was lumped in to take advantage of the existing iteration through systems & routes.


Low priority

yakra commented 1 year ago

Solving the region.php rankings bug while maintaining good performance means reunifying CompStatsRThread + CompStatsTThread& iterating by region. Nothing else iterates by region, so the new combined CompStatsThread stays; there's nothing else to refactor it into.

I'll close this issue and replace it with another one that reflects that reality soon.

yakra commented 1 year ago

OTOH, the LABEL_SELFREF datacheck requires that all .wpts are read, that all colocated points are detected. It has to happen after ReadWptThread. The only other other options are NMPSearchThread & NMPMergedThread, which just don't make sense from an organizational standpoint. I think I'll leave this well enough alone.