for Bigints of size 254/255 the LLVM IR has better worst case codegen than builtin llvm.usub.with.overflow (1~2 extra instructions instead of 33% extra instructions https://github.com/mratsim/constantine/issues/357#issuecomment-2288608867) but for size 256 it degrades to 66% extra instructions.
This PR has changes behind the code generator from #456 that triggered issues:
for Bigints of size 254/255 the LLVM IR has better worst case codegen than builtin llvm.usub.with.overflow (1~2 extra instructions instead of 33% extra instructions https://github.com/mratsim/constantine/issues/357#issuecomment-2288608867) but for size 256 it degrades to 66% extra instructions.