unzvfu / cuda-fixnum

Extended-precision modular arithmetic library that targets CUDA.
MIT License
34 stars 8 forks source link

Use only unsigned values for RHS of compile-time div and mod #55

Open unzvfu opened 4 years ago

unzvfu commented 4 years ago

Even when the RHS is known at compile time, the need to manage sign extension issues (double-check this is actually the reason) makes div and mod slower with signed RHS than with unsigned. See supporting evidence here, which shows that, for example, it takes about five instructions to do signed modulo 4 but only one instruction for unsigned modulo 4.

Note that this goes against the usual wisdom of using signed arithmetic where possible to avoid having the compiler generate code for (unused) wrap-around in unsigned arithmetic. There will probably be several cases where the choice between signed and unsigned will come into conflict.

Need to go through every use of signed and unsigned and ensure that the most efficient choice is made each time.