This implements almost exactly the same approach as #3, but includes a little more error checking. In benchmarks, this approach is about 5% slower than @agrison's, but it's still roughly a 4x improvement over master (which was itself a 4x improvement over the stock implementation).
This implements almost exactly the same approach as #3, but includes a little more error checking. In benchmarks, this approach is about 5% slower than @agrison's, but it's still roughly a 4x improvement over
master
(which was itself a 4x improvement over the stock implementation).