microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.56k stars 3.82k forks source link

How to get the number of trainable parameters #5637

Closed ravi160822 closed 1 year ago

ravi160822 commented 1 year ago

In transformers, I directly have a method on the model to get the number of parameters, I was using the lambda rank training strategy, and I could not find a get parameters method, how do I get to know the total number of trainable parameters in the neural network.

jameslamb commented 1 year ago

Thanks for using LightGBM.

A LightGBM model is not a neural network. It's an ordered ensemble of decision trees, where the trees' scores are added to produce a prediction of the target.

So there is not exactly a concept in projects like this, XGBoost, CatBoost, etc. analogous to "number of trainable parameters" in a neural net.

In LightGBM, the complexity of the model is measured by the total number of leaf nodes over all the trees. The upper limit of that is determined by the following constraints:

These constraints can tell you the MAXIMUM model size, but in practice a trained model may be smaller based on the size and distribution of the training data.

ravi160822 commented 1 year ago

then what would be the parameters / weights in this case.

In a neural network I have the loss function, and I compute the derivative of the loss function with respect to each of the weights, and subtract the weights with learning rate times this derivative / gradient wrt each weight respectively.

If there is no parameter in lightgbm, then what is the substitute of derivative of loss with respect to weight here.

Furthermore, transformer use 300 million+ parameter, and I use a pre trained transformer (like BERT), and substitute the final layers to fine tune this neural network.

If there is no parameter in lightgbm, then what is the substitute of freezing the layers, and how do I fine tune a pre trained lightgbm model?

Furthermore, when I change the objective to lambdamart then it gives error, is there a feature request for implementing lambdamart also along with lambdarank.

jameslamb commented 1 year ago

what would be the parameters / weights in this case?

what is the substitute of derivative of loss with respect to weight here?

LightGBM creates an ensemble of trees. Those trees are composed of "splits", combinations of features and thresholds that partition the data (e.g. feature_5 > 7.18).

Each split added to a tree is the one that maximizes the "gain" of the model, which conceptually is like "improvement in the loss function". This is how the choice of loss function impacts the eventual model.

how do I fine tune a pre trained lightgbm model?

Please see this answer: https://stackoverflow.com/questions/73664093/lightgbm-train-vs-update-vs-refit/73669068#73669068.


I think you might find LightGBM easier to understand if you learned about it from first principles, instead of trying to compare it to things like BERT that are very different.

Please see these resources to get started:

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

lcrmorin commented 1 year ago

I had a similar question and got an answer from P. Harell here: https://stats.stackexchange.com/questions/584759/tree-models-and-information-criterion there is a paper from Ye that helps estimate parameters of decision trees. I guess you can sum all parameters across trees for a start on boosted trees. However it seems that the real answer is that number doesnt matter that much. For ML model what matter is performance and if you want to estimate a trade-off between performance and complexity (ressources) you need to go to real world metrics (memory used, calibration time, inference time ... etc.)

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.