While trying out some ideas, I rewrote major parts of carryFused. To me, this is more readable.
1) unweightAndCarry did not apply weights
2) carryFinal did apply weights - that's inconsistent
3) SUB2 code moved so that prime95 "shift counts" could be implemented more easily
4) Weights code separated from carry code
While trying out some ideas, I rewrote major parts of carryFused. To me, this is more readable. 1) unweightAndCarry did not apply weights 2) carryFinal did apply weights - that's inconsistent 3) SUB2 code moved so that prime95 "shift counts" could be implemented more easily 4) Weights code separated from carry code
It is no faster, nor is it slower (rocm 3.3)
BTW, the idea I was trying did not pan out.