Closed Zahlii closed 3 years ago
Some further ones, this time including categorical-only features
Classifier
Regression
What's the issue here? The plots look fine to me. Some notes:
lightgbm.plot_tree
to make sure they are not somehow degenerate.lleaves
is kept pretty simple and just implemented in Python, whereas LightGBM calls pthreads directly from C++ afaik. This will mean the parallelization overhead of lleaves is larger, hence the break-even comes somewhat late.@siboehm no real issue here; I just wanted to share the findings I had based on the benchmark. To me, the important take-away is that for most inference payloads WE are seeing (usually 1-100 samples at a time), lleaves provides a performance gain, although only with disabled parallelization. Since the break-even can vary wildly, I think it may be important for high-performance settings to smartly toggle the parallelization on/off depending on the number of samples to be predicted at once.
That's true! Thanks for sharing your benchmark results, I thought there was some performance issue you were bringing up but even after squinting hard at the plots could see anything out of the ordinary :D So I'm happy lleaves is working well for you!
Regarding the parallelization:
os.cpu_count()
. On a CPU with Hyperthreads this will be 2x the number of physical cores. Alternatively lleaves could default to something like os.cpu_count() / 2
, which probably has much less overhead for only a slight dip in performance.If it's ok for you feel free to close the issue, but do keep me in the loop if you find any other outliers / observations :) I'm interested in how people are using lleaves and whether it makes more sense to develop the library into the easy-to-use or highest-possible-performance direction.
Hi, currently evaluating this as a potential performance enhancement on our MLOps / Inference stack.
Tought I'd give some numbers here (based on MacBook Pro 2019).
Test set up as follows: a) generate artificial data X = 1E6 x 200 float64, Y = X.sum() for regression, Y = X.sum() > 100 for binary classifier b) for n_feat in [...] -> fit model on 1000 samples and n_feat features; compile model c) for batchsize in [...] -> predict 10 times a randomly sampled batch of all data items, using (1) LGBM.predict(), (2). lleaves.predict(), (3) lleaves.predict(n_jobs=1); measure TOTAL time taken
For regression results are:
Independent of the number of features, the break-even between parallel lleaves and 1 job seems to be around 1k samples at once, independent of the number of features. Using this logic, we would get better performance than LGBM at all number of samples.
For classification:
Also, here, the break-even is around 1k samples.
For classification with HIGHLY IMBALANCED data (1/50 positive), the break-even is only at 10k samples - Any ideas on why this is the case?