unzvfu / cuda-fixnum

Extended-precision modular arithmetic library that targets CUDA.
MIT License
32 stars 7 forks source link

Understand why CLNW sliding-window is faster than k-ary in the tests #25

Open unzvfu opened 4 years ago

unzvfu commented 4 years ago

From https://github.com/data61/cuda-fixnum/issues/43:

When running the test suite, modexp (CLNW) seems faster than multi_modexp (k-ary) (at least in the 128 & 256 byte range), though this doesn't really make sense, since CLNW branches based on the bit pattern of the exponent whereas k-ary does not.

Work out what's going on. Replace k-ary if necessary.