siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
343 stars 29 forks source link

Additional performance benchmarks #4

Closed Zahlii closed 3 years ago

Zahlii commented 3 years ago

Hi, currently evaluating this as a potential performance enhancement on our MLOps / Inference stack.

Tought I'd give some numbers here (based on MacBook Pro 2019).

Test set up as follows: a) generate artificial data X = 1E6 x 200 float64, Y = X.sum() for regression, Y = X.sum() > 100 for binary classifier b) for n_feat in [...] -> fit model on 1000 samples and n_feat features; compile model c) for batchsize in [...] -> predict 10 times a randomly sampled batch of all data items, using (1) LGBM.predict(), (2). lleaves.predict(), (3) lleaves.predict(n_jobs=1); measure TOTAL time taken

For regression results are:

image

Independent of the number of features, the break-even between parallel lleaves and 1 job seems to be around 1k samples at once, independent of the number of features. Using this logic, we would get better performance than LGBM at all number of samples.

For classification:

image

Also, here, the break-even is around 1k samples.

For classification with HIGHLY IMBALANCED data (1/50 positive), the break-even is only at 10k samples - Any ideas on why this is the case?

image

Zahlii commented 3 years ago

Some further ones, this time including categorical-only features

Classifier image

Regression image

siboehm commented 3 years ago

What's the issue here? The plots look fine to me. Some notes:

Zahlii commented 3 years ago

@siboehm no real issue here; I just wanted to share the findings I had based on the benchmark. To me, the important take-away is that for most inference payloads WE are seeing (usually 1-100 samples at a time), lleaves provides a performance gain, although only with disabled parallelization. Since the break-even can vary wildly, I think it may be important for high-performance settings to smartly toggle the parallelization on/off depending on the number of samples to be predicted at once.

siboehm commented 3 years ago

That's true! Thanks for sharing your benchmark results, I thought there was some performance issue you were bringing up but even after squinting hard at the plots could see anything out of the ordinary :D So I'm happy lleaves is working well for you!

Regarding the parallelization:

If it's ok for you feel free to close the issue, but do keep me in the loop if you find any other outliers / observations :) I'm interested in how people are using lleaves and whether it makes more sense to develop the library into the easy-to-use or highest-possible-performance direction.