yakra / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
0 stars 0 forks source link

I/O vs CPU bottlenecks #114

Closed yakra closed 3 years ago

yakra commented 4 years ago

Writing near-miss point merged wpt files may be the purest example of a disk I/O bottleneck. It's pretty lightweight, consisting mainly of output streams, some sprintf and a couple ifs.

On lab2, it takes all of 0.8s. Reading waypoints for all routes OTOH processes about the same amount of data, and takes 10.8s. And disk read benchmarks on lab2 are way faster than disk write benchmarks.

This all suggests that when Reading waypoints for all routes, the bottleneck is CPU and/or memory based. Running just 4 threads on BiggaTomato, CPU use hovers @~65%. If we know disk I/O isn't our bottleneck, this suggests that mutexes are to blame.

Disk I/O

Building lab3 can help test out disk I/O bottleneck effects. RAID, xor solid state. Hypotheses:

Reading waypoints for all routes

Mutexes:

yakra commented 4 years ago

strtok_mtx: For processing individual .wpt lines, we're only interested in the space character as a delimiter. strchr should be slightly more efficient than strcspn.

strcspn: check for \n before \r

yakra commented 4 years ago

strchr is the tiniest little bit less efficient than strcspn -- because it returns a null pointer if nothing is found, rather than the string length, I have to check whether the pointer is valid before dereferencing it to look for spaces. This is just enough to be a deal-breaker.

Let's try this out for size:

inline size_t strchrspn(const char *str, char chr)
{   size_t c = 0;
    while (str[c] != chr && str[c]) c++;
    return c;
}
yakra commented 4 years ago

testing / benchmarking: HighwayData @ 27a625459516a8f5c53f73a24ce51416ab991677 UserData @ 3ec63996d33e499027a3d4c8f47abc0d3ce7659f