siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
354 stars 29 forks source link

Spawn multi-Model inference across cores #51

Open inkrement opened 1 year ago

inkrement commented 1 year ago

I would like to run multiple regression models at once. All use the same input, is there a way to parallelize the inference? Right now I apply them sequentially. Thank you in advance!

siboehm commented 1 year ago

Not quite sure if I understand the problem. You're running N trees over the same dataset D to get N predictions, correct? Lleaves already parallelizes inference over the data in a very simple way, see here: https://github.com/siboehm/lleaves/blob/master/lleaves/lleaves.py#L183 You want to additionally parallelise over the models N?

If the dataset is big enough in relation to your amount of CPU cores, the data parallelism should stress your system enough st additional parallelisation will be a minor benefit at best. If the dataset isn't big enough or you have tons of cores, I guess you could additionally parallelise each model using plain Python multiprocessing or use the low-level C API and write your own low overhead parallelism.

inkrement commented 1 year ago

Yes, N independent trees over the same dataset to get N predictions. It is interesting that lleaves automatically parallelizes inference. Although I use it on a 20+ core Intel CPU, I haven't seen a higher CPU utilization than 120%. Maybe I should pass bigger batches (I have tried 15k-100k observations á 150 features). I'll take a closer look. By the way, thanks for this fantastic piece of software!

siboehm commented 1 year ago

That sounds like a big enough dataset that it should definitely parallelise well! Lmk what you find. I'd probably try:

  1. making sure that I'm not measuring compilation accidentally, instead of inference by caching the compilation. Compilation can take really long, it's the main downside of lleaves.
  2. Try setting n_jobs to e.g. 2 or 4 instead of os.cpu_count() (which is the default) and making sure you're getting close to 200% / 400% cpu utilization.
  3. Making sure you're not disk bottlenecked, e.g. you're not using a low-throughput NFS.
inkrement commented 1 year ago

I'll investigate it in more detail, but I can rule out (1) I am loading pre-compiled/cached models (3) the data is already in memory