llvm: more tentatives at optimal field addition with pure LLVM IR

This PR has changes behind the code generator from #456 that triggered issues:

https://github.com/llvm/llvm-project/issues/103841
https://github.com/llvm/llvm-project/issues/103855
https://github.com/llvm/llvm-project/issues/103946

for Bigints of size 254/255 the LLVM IR has better worst case codegen than builtin llvm.usub.with.overflow (1~2 extra instructions instead of 33% extra instructions https://github.com/mratsim/constantine/issues/357#issuecomment-2288608867) but for size 256 it degrades to 66% extra instructions.

mratsim / constantine

llvm: more tentatives at optimal field addition with pure LLVM IR #457