microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.71k stars 3.83k forks source link

predict performance #1927

Closed henuxhj closed 5 years ago

henuxhj commented 5 years ago

Environment info

Operating System: CentOS, kernel 3.10.104 CPU/GPU model: Xeon(R) CPU E5-2670 v3 @ 2.30GHz C++/Python/R version: C++11, GCC 4.8.5 LightGBM version: 02/20/2017 : Update to LightGBM v2.0

Error message

predict performance

Reproducible examples

There are two lightgbm model(modelA and modelB),The detail of model configuration is as follows. model A: tree number: 300, depth :20, other paramters same model B: tree number: 500, depth :20, other paramters same

  1. The time cost of prediction between model A and model B is not linear increase.
  2. The CPU consume of prediction between model A and model B is not linear increase.
  3. Is there anything different between model A and model B except the number of tree? Look forward to your reply.

Steps to reproduce

guolinke commented 5 years ago

The prediction of lightgbm is designed for multiple samples, did you run for multiple samples at once?

henuxhj commented 5 years ago

700 - 800 samples each time. and I use CSR predict interface 【LGBM_BoosterPredictForCSR()】

guolinke commented 5 years ago

Everytime you call the prediciton, there are some overheads, for example, the buffer allocations and so on. BTW, LightGBM isn't designed for prediction. For better inference speed, you can try https://github.com/dmlc/treelite

henuxhj commented 5 years ago

thank you.