siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
322 stars 27 forks source link

Platform interoperability #27

Open TomScheffers opened 1 year ago

TomScheffers commented 1 year ago

Is there a way to effectively check if compiled models are able to run on a machine?

I am running predictions on various platforms, when loading the compiled model, I load the one which was compiled on the same platform (using: PLATFORM = sys.platform + '-' + sysconfig.get_platform().split('-')[-1].lower(), resulting in either darwin-arm64 or linux-x86_64). However sometimes models which are compiled in a linux-x86_64 environment, are not interoperable with other linux-x86_64 machines (I use AWS Fargate, which runs the container on whatever hardware is available). This results in exit code 132 (Illegal Instruction) in the model.predict() loop.

The underlying reason is probably that the underlying machines are not the same architecture (ARM based?). For example, when I compile a model within a Docker container (with DOCKER_DEFAULT_PLATFORM=linux/amd64) on my M1 Mac, it registers the platform as linux-x86_64, but the model cannot be used on AWS linux machine using Docker.

What would be a solid way to go about this issue? Is there some LLVM version which I need to look at in order for models to be interoperable?

Thanks a lot.

siboehm commented 1 year ago

Related to #12. I think the issue you're encountering is that lleaves basically compiles with march=native, meaning the code is targeted to the microarchitecture (Haswell, Skylake etc) as well as the ISA extensions (AVX256, AVX512, SSE4, ...). So there's no guarantee that a cached binary file will run on a different CPU, unless they're the same model. So you're compiling on one x86_64 machine and lleaves emits some instructions that don't exist on a different / older x86_64 CPU.

In the current lleaves version there's not much you can do except to compile on the machine that you'll run the final binary on. The way to fix this is to introduce a new flag in the compile() method, something like native=False. Then lleaves should disable the hyper-specific instruction targeting. The relevant code is here: https://github.com/siboehm/lleaves/blob/master/lleaves/llvm_binding.py#L16 This won't be a big change, but it'll require some testing. I can't really tell you when I'll get around to implementing it. It's on my todo list eventually, but I'll also accept PRs for it :)

Issues like this are to some degree the consequence of using llvmlite to interface with LLVM, as opposed to writing a proper LLVM compiler. llvmlite is made for JIT-compilers like numba, which always assume you'll run the code on the machine that it was compiled on.

TomScheffers commented 1 year ago

Okay, it totally makes sense now. Thanks again for your quick response.

For me it would still be beneficial to use specific instruction targeting, however I need to know which compiled version my machine requires. For now I will hash the llvm.get_host_cpu_features to compute interoperability. That should work right 😄? Something like:

import hashlib, json
import llvmlite.binding as llvm

h = hashlib.sha256()
h.update(json.dumps(dict(llvm.get_host_cpu_features()), sort_keys=True).encode())
key = h.hexdigest()
siboehm commented 1 year ago

Without think about it for long, I'd probably use get_host_cpu_features, get_cpu_name and get_process_triple, that should fully determine the specific CPU + CPU features + operating system.