3+_intersection: induction variables; suffix efficiency

yakra / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project

0 stars 0 forks source link

induction variables

In siteupdate.py, the enumerate function was used in 2 places:

create label hashes and check for duplicates
- ~Changes, commited locally but not yet pushed, for HIDDEN_JUNCTION w/devel data, will modify this to use induction variables.~ Nope. Decided against that.
canonical_waypoint_name for 3+_intersection cases
- [ ] Cleaning up ap_coloc[check] and ap_coloc[other] indexing can improve readability. Or not. Tons of new code, making ap_coloc indexing a much smaller piece of the pie. Keeping it in place can make the commit diffs more readable.

suffix efficiency

https://github.com/yakra/DataProcessing/blob/03a36cab6f56c4f060bc41ce4739707a03969a15/siteupdate/cplusplus/classes/Waypoint/canonical_waypoint_name/3plus_intersection.cpp#L28-L34

Unnecessary comparison ~on top of unnecessary scanning~. This ~whole~ thing's still a little Pythonic. :) We do want to scan, because a suffix may come after a slashed designation. https://github.com/TravelMapping/DataProcessing/issues/418#issuecomment-2009579696 The bullet points below don't really apply anymore. Just look at the code in the next post.

[ ] Just check for _ at the appropriate place: if (check->label[other_no_abbrev.size()] == '_') suffixes[index] = check->label.data()+other_no_abbrev.size();
[ ] The || bit can become: else if (!strncmp(check->label.data()+other_no_abbrev.size(), other->route->abbrev.data(), other->route->abbrev.size()) && check->label[other_no_abbrev.size()+other->route->abbrev.size()] == '_') suffixes[index] = check->label.data()+other_no_abbrev.size()+other->route->abbrev.size();
[ ] This is all very wordy though. Improve readability by keeping/reintroducing the const char* suffix variable.
- [ ] This time, initialize it to the position within the label.
- [ ] Dereference to check whether _.
- [ ] If the first part of the conditional before the || doesn't produce results, just advance it: suffix += other->route->abbrev.size();
- [ ] Keep the whole conditional as an || instead of an else if, and save on instruction cache & program size by using one single suffixes[index] = suffix;

comment

[ ] I've had to figure this out at least twice, so add a comment here (or equivalent):

// It's tempting to break here as we have a match for this check.
// But we need to keep going to ensure suffixes are populated.

// approach: check if each label contains some route number // in the list of colocated routes, and if so, create a label // slashing together all of the route names, and save any _ // suffixes to put in and reduce the chance of conflicting names

auto suffixes = new Suffix[ap_coloc.size()]; for (unsigned int check = 0; check < ap_coloc.size(); check++) { match = 0; for (unsigned int other = 0; other < ap_coloc.size(); other++) { if (other == check) continue; Route* &r = ap_coloc[other]->route; std::string other_no_abbrev = r->name_no_abbrev(); const char* c; // "cursor" points to after a designation match in label, if found // first, look for a match at the beginning if ( !strncmp(ap_coloc[check]->label.data(),other_no_abbrev.data(),other_no_abbrev.size()) ) c = ap_coloc[check]->label.data()+other_no_abbrev.size(); // if unsuccessful, try after a slash else if (c = strchr(label.data, '/')) { // "slash designation" to look for is either a // single letter followed by a number, like M25 const char* sl_d = other_no_abbrev.data(); // or otherwise just start at the numeric part if (!isalpha(other_no_abbrev[0]) || !isdigit(other_no_abbrev[1])) do { if (isdigit(*sl_d)) break; sl_d++; } while (*sl_d); if (!*sl_d || strncmp(++c, sl_d, other_no_abbrev.data()+other_no_abbrev.size()-sl_d)) continue; } else continue; // Skip past abbrev if present if ( !strncmp(c,r->abbrev.data(),r->abbrev.size()) ) c += r->abbrev.size(); // Have we reached the end of the designation? // Avoid false matches, like NY50/67 for NY5 if (!strchr("(/_", *c)) continue; match = 1; // Search for a suffix. It may be after a secondary // slashed designation, so scan ahead using strchr. if (c = strchr(c, '_')) suffixes[other].set(c); // It's tempting to break here as we have a match for this check. // But we need to keep going to ensure suffixes are populated. } if (!match) break; } if (match) { std::string newname = ap_coloc[0]->route->list_entry_name() + suffixes[0](); for (unsigned int index = 1; index < ap_coloc.size(); index++) if (ap_coloc[index]->route != ap_coloc[index-1]->route) newname += '/' + ap_coloc[index]->route->list_entry_name() + suffixes[index](); g->namelog("3+_intersection: " + name + " -> " + newname); delete[] suffixes; return newname; } delete[] suffixes;

yakra / DataProcessing