This PR refactors how assembly generation works, I now use a small tool I've written called gocc which generates the assembly using clang and re-translates it to Go assembly files. It mainly simplifies the workflow, but it seems like clang does a slightly better job in generating assembly than I did manually. So there's even some performance improvements to x86 (~5% from what I observed).
Most importantly, it now introduces hardware acceleration to Arm64 Linux (e.g. Graviton on AWS) and Apple Silicon by using NEON instructions.
This PR refactors how assembly generation works, I now use a small tool I've written called
gocc
which generates the assembly usingclang
and re-translates it to Go assembly files. It mainly simplifies the workflow, but it seems like clang does a slightly better job in generating assembly than I did manually. So there's even some performance improvements to x86 (~5% from what I observed).Most importantly, it now introduces hardware acceleration to Arm64 Linux (e.g. Graviton on AWS) and Apple Silicon by using NEON instructions.