research: update LLVM x86 compiler and JIT

This updates the research on adding an x86 backend to our LLVM bindings following #402.

Furthermore this tests whether it's possible to create pure assembly .S files without Nim so that:

It's easier to integrate in C/Rust/Go as we remove the Nim toolchain
It's possible to reduce generate all code in the same LLVM module to ensure inter-procedural optimization, constant inlining and deduplication.
We don't depend on GCC codegen quality which is bad for big integers/cryptography

Unfortunately there is a showstopper:

We cannot use inline assembly and so ADOX/ADCX to print assembly

And a couple of annoying part:

If we fix the showstopper, we need to be able to clear the eflags (carry, overflow, sign, zero, ...) This is usually done by xor but it will be optimized away by the compiler
We need to ensure the compiler does not reorder in between add-with-carry operations or accidentally clear the carry flag, for example when we add with 0 for the final carries.
We need to be able to rotate buffers.
Despite using bmi2, LLVM does not seem to generate mulx instruction.

How to fix the showstopper?

We can compile the module then use LLVM disassembler.

For all other arch, we should have better luck as ADCX/ADOX should be the only instructions requiring inline assembly or they support inline assembly in their ASM printer (GPU Virtual ISAs like Nvidia PTX).

mratsim / constantine

research: update LLVM x86 compiler and JIT #452