Use `long double` as `native_float80_t` on ARM64

Right now we use long double as native_float80_t on x86/amd64. This is easy because on those architectures, a long double is just an 80-bit float wrapped in some padding. All we have to do is copy the meaty float bits in/out of a long double, and things just work.

On arm64 (Linux, not MacOS since MacOS ignores the aarch64 ABI), a long double is an IEEE 128-bit float. This can obviously represent everything an 80-bit float can, and more. We should use it as the intermediate floating point type (native_float80_t) and store/load an 80-bit representation when we have to write to the x87 FPU stack.

This will require some custom conversions that extract the exponent/mantissa/etc out of 80-bit floats and convert them to 128-bit floats (and vice versa), and also handle various NaN representations between the two formats. The conversions are certainly feasible (some examples of 80-bit floats to 64-bit floats are on StackOverflow) but this would also need some extra test infrastructure to do a check of "x86 float on arm64" sanity.

The current solution is to just make native_float80_t a 64-bit double on arm64, which is the previous remill behavior for all architectures. This is mostly fine, but comes with a loss of precision.

lifting-bits / remill

Use `long double` as `native_float80_t` on ARM64 #537