I have added a pure-software, SSE-intrinsic-free, un-striped matrix filler, and tests to compare its results against the SSE2 results.
I've also modified the SSE2 martrix filler to be less lazy, and to fully compute the E and F matrices (allowing for back to back gaps), with as many lazy F loops as are required to actually get the correct matrix values (instead of just as many as are needed for the best alignment).
This should fix #8.
I have added a pure-software, SSE-intrinsic-free, un-striped matrix filler, and tests to compare its results against the SSE2 results.
I've also modified the SSE2 martrix filler to be less lazy, and to fully compute the E and F matrices (allowing for back to back gaps), with as many lazy F loops as are required to actually get the correct matrix values (instead of just as many as are needed for the best alignment).