ostafen / smart

String Matching Algorithms Research Tool
https://smart-tool.github.io/smart/
GNU General Public License v3.0
4 stars 2 forks source link

Forking for isolation and better stats #58

Open nishihatapalmer opened 1 year ago

nishihatapalmer commented 1 year ago

There's two reasons to consider forking smart to run algo benchmarks with fork()

1) Algorithm process isolation.
2) Better stats

Process isolation Since each algo would run in its own process, bugs in one algorithm cannot affect the operation of another or corrupt other smart data.

In addition, since they would be running from a copy of the smart environment, the cache would not be primed for each algorithm. At the moment, the first algorithm that runs probably incurs more cache misses than subsequent ones, as they all look at the same pattern data and text to search.

Better stats If you run identical benchmarks serially, you will see a fair bit of variation in bench-marking times. The only difference is the process it is running in (and whatever else the OS is doing at the time I guess). In a perfect world, the timings would be almost identical for identical benchmark runs on the same machine and OS.

I have seen other benchmark frameworks (e.g. jmh) run more than one set of benchmarks across several forked processes. Taking benchmark stats across several forks (while slower) allows those process-related variations to be averaged out, and gives a better comparison of algorithm performance.

Considerations

1) Speed 2) Parent-child communication. 3) CPU pinning

Speed Obviously, bench-marking each algorithm in a different process will slow things down. The cost of initializing each child process with a full copy of smart (including search texts loaded) will take a little time. And obviously, if you choose to run more than one fork, then you will slow things down, as you're repeating the same benchmarks several times.

However, old smart would create a new process for each algo and pattern size and number of runs. This was definitely slower - but we would only be creating a new process for each algo (less than 10 typically) and pattern size (around 10 typically). The number of runs would all happen in the child process, and this is by far the bigger number (default 500). So should still be faster than old smart here.

Parent-child communication To collect all the stats for all algorithms, we would need some way to communicate between the parent and child processes. The obvious techniques are:

1) Append to a file. 2) Use pipes.

If we append to a file, all that happens is that each benchmark appends its results directly to the same results file, sequentially. When all benchmarks are finished, the parent process would need to load in that file if it needed to further transform or analyze the recorded measurements and stats, e.g. to identify the fastest overall algorithm.

It's also pretty easy to have the child communicate with the parent over a pipe. The child can just pipe the results back and the parent can parse and record them in a single data structure, before analysis and output.

CPU pinning Obviously, if we fork the smart process, we'd need to move CPU pinning to after the fork, so the process that is doing the bench-marking is the one that is pinned. That should be fairly straightforward.

nishihatapalmer commented 1 year ago

Probably worth trying to measure the process variation that currently exists. Then compare with averaging over a few forks, with proof of concept code, just hacked together to work, no huge PRs on their way!