// split line into fields
char* c = strchr(line, ' ');
if (c){ do *c = 0; while (*++c == ' ');
while (char*d = strchr(c, ' '))
{ do *d = 0; while (*++d == ' ');
alt_labels.emplace_back(c);
c = d;
}
}
label = line;
This doesn't push to alt_labels or update c until the following field is found.
With nothing following the URL, c is left pointing to its beginning, and it can be parsed in place in the char* buffer.
That's the simple part.
For the SINGLE_FIELD_LINE checks, alt_labels.empty() -> !c
URL parsing will then need a rewrite to work with C strings instead of using the std::string methods.
Since the MALFORMED_LAT & MALFORMED_LON will have to change, it makes sense to do this at the same time as #239.
valid_line also gets caught up in this net. Do this at the same time as Route's TMArray<Waypoint> conversion, when the ctor will need to throw.
This could yield a decent speedup, especially under FreeBSD where we experience a minor memory bottleneck.
(OTOH, much of this bottleneck may already disappear due to the TMArray conversion itself, which replaces on average 12.77 Waypoint allocations per route (and 5 vector reallocations) with one TMArray allocation, not to mention corresponding improvements the TMArray<HighwaySegment> conversion.)
Side topic:
Search & destroy strcspn.
The remaining cases can be similarly converted to strchr or strpbrk as appropriate. The difference may be small, but directly manipulating a pointer uses fewer ops than array indexing.
Prefer strchr when there's only 1 delimiter, to avoid the overhead of looping thru 1-char strings.
For each case, there may be other nearby minor improvements to make (e.g. in TravelerList ctor) as well.
Open up a new issue when this one is closed.
Or not.
strpbrk is out. (For most cases?)
It returns null when no chars are found, so we can't dereference.
Maybe I could make it work, but it's probably not worth the trouble. It may end up being less efficient in the end anyway.
strchr still viable?
Beware cases where I rely on arriving at a valid pointer to the terminator, and a null pointer won't work.
Asides:
The difference may be small, but directly manipulating a pointer uses fewer ops than array indexing.
yields no diff to the binary. The compiler optimizes it away.
Further attempting to optimize that block to
for (char*d; *c; c=d)
{ d = c+strcspn(c, "\n\r");
do *d=0; while (*++d == '\n' || *d == '\r');
lines.emplace_back(c);
}
in an effort to get rid of the 1st "unnecessary" check for a newline fails -- d starts out pointing to either a newline or the null terminator, zeroing out the latter of which is harmless as it's already 0.
Problem is, the ++d moves it past the terminator, and if there's a newline, the while loop continues, zeroing out stuff it shouldn't. Wacky hijinks ensue, such as siteupdate: malloc.c:3729: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
468 took from from 2 unnecessary allocations, copies & constructions of the URL string to 1.
We can do better, and get rid of that remaining unnecessary copy.
Instead of https://github.com/yakra/DataProcessing/blob/c51fd20e1f0b896d072bef7ba7740a8e260f8f2b/siteupdate/cplusplus/classes/Waypoint/Waypoint.cpp#L21-L30
Something like this ought to work:
This doesn't push to
alt_labels
or updatec
until the following field is found. With nothing following the URL,c
is left pointing to its beginning, and it can be parsed in place in thechar*
buffer. That's the simple part.alt_labels.empty()
->!c
std::string
methods.valid_line
also gets caught up in this net. Do this at the same time asRoute
'sTMArray<Waypoint>
conversion, when the ctor will need tothrow
.This could yield a decent speedup, especially under FreeBSD where we experience a minor memory bottleneck. (OTOH, much of this bottleneck may already disappear due to the TMArray conversion itself, which replaces on average 12.77 Waypoint allocations per route (and 5 vector reallocations) with one TMArray allocation, not to mention corresponding improvements the
TMArray<HighwaySegment>
conversion.)Side topic: Search & destroy
strcspn
. The remaining cases can be similarly converted to strchr or strpbrk as appropriate. The difference may be small, but directly manipulating a pointer uses fewer ops than array indexing. Preferstrchr
when there's only 1 delimiter, to avoid the overhead of looping thru 1-char strings. For each case, there may be other nearby minor improvements to make (e.g. in TravelerList ctor) as well. Open up a new issue when this one is closed.Or not.
strpbrk
is out. (For most cases?) It returns null when no chars are found, so we can't dereference. Maybe I could make it work, but it's probably not worth the trouble. It may end up being less efficient in the end anyway.strchr
still viable? Beware cases where I rely on arriving at a valid pointer to the terminator, and a null pointer won't work.Asides:
In theory, sure. In practice? Changing https://github.com/yakra/DataProcessing/blob/80f27243bf26579c53c6c7a215a5aeb22347317a/siteupdate/cplusplus/classes/Route/read_wpt.cpp#L43-L46
to
yields no diff to the binary. The compiler optimizes it away.
in an effort to get rid of the 1st "unnecessary" check for a newline fails --
d
starts out pointing to either a newline or the null terminator, zeroing out the latter of which is harmless as it's already 0. Problem is, the++d
moves it past the terminator, and if there's a newline, thewhile
loop continues, zeroing out stuff it shouldn't. Wacky hijinks ensue, such assiteupdate: malloc.c:3729: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.