This PR changes the basic field multiplier to be the Montgomery modular multiplier. This means that the Field class now assumes Montgomery as the default representation of elements.
The PR is currently missing mul_const optimization which is critial for the EC adder. Even with that optimization the MSM performance is 5% worse in CUDA - can we find the cause?
This PR changes the basic field multiplier to be the Montgomery modular multiplier. This means that the Field class now assumes Montgomery as the default representation of elements.
cuda-backend-branch: hadar/gpu_mont_mult