siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
333 stars 28 forks source link

[Question] how does model cache play with distributed workers with different CPUs? #31

Closed crayonfu closed 1 year ago

crayonfu commented 1 year ago

Hello, thank you for this great library. I have a question about the model cache file. I am using Ray to manage a small cluster of PCs with both Intel/AMD CPUs, and different OS (Ubuntu/ClearLinux). My program has been using numba to speed things up, and the JIT mode (instead AIT mode) works fine. Ray can send the numba functions to different PCs in the cluster and they compile locally.

So for lleaves, if I compile the models on one node, and distribute the generated cache file to all nodes in the cluster, will it work? or I have to stick to the "JIT" mode, where models are always compiled locally each time? I am using ensemble methods with many lgbm models (total >1000, each is small about 100 trees, max_depth 10). Or maybe I should have all models compiled locally on each PC? Thank you.

siboehm commented 1 year ago

So right now, if you cannot guarantee that the CPUs are equal (equal meaning they support the same instructions, due to having the same CPU architecture, same ISA Extensions etc, see #27) you'll have to compile locally on each node, using JIT mode. The different OSs are not a problem, but it sounds like you have heterogeneous CPUs. You can still cache the compilation on each local node obviously to speed things up.

Once I've done something about #27 (hopefully soon) you'll be able to compile for a more generic CPU architecture (i.e. compiling for every CPU newer than Haswell). Then you can pick the lowest common CPU arch, compile for it once and distribute the binary to all machines. That being said, since JIT mode compiles for the specific local arch (just like -march=native), it'll always be slightly faster.

crayonfu commented 1 year ago

Thank you for the quick answer. I should have looked at #27 first. I will compile the models locally on each node.

siboehm commented 1 year ago

No worries, closing this as done. Lmk if you encounter any issues, it's useful for me to hear user feedback to do feature prioritization.