logicai-io / recsys2019

Apache License 2.0
94 stars 15 forks source link

Tuning LGBM #20

Open pjankiewicz opened 5 years ago

pjankiewicz commented 5 years ago

Optymalizacja parametrów LGBMRanker.

Przydatna praca https://github.com/logicai-io/recsys2019/blob/master/publications/burgesLearningToRank-2011.pdf

pjankiewicz commented 5 years ago

Z issuesów lightgbm

For heavily unbalanced datasets such as 1:10000:

max_bin: keep it only for memory pressure, not to tune (otherwise overfitting)
learning rate: keep it only for training speed, not to tune (otherwise overfitting)
n_estimators: must be infinite (like 9999999) and use early stopping to auto-tune (otherwise overfitting)
num_leaves: [7, 4095]
max_depth: [2, 63] and infinite (I personally saw metric performance increases with such 63 depth with small number of leaves on sparse unbalanced datasets)
scale_pos_weight: [1, 10000] (if over 10000, something might be wrong because I never saw it that good after 5000)
min_child_weight: [0.01, (sample size / 1000)] if you are using logloss (think about the hessian possible value range before putting "sample size / 1000", it is dataset-dependent and loss-dependent)
subsample: [0.4, 1]
bagging_freq: only 1, keep as is (otherwise overfitting)
colsample_bytree: [0.4, 1]
is_unbalance: false (make your own weighting with scale_pos_weight)
USE A CUSTOM METRIC (to reflect reality without weighting, otherwise you have weights inside your metric with premade metrics like xgboost)
Never tune these parameters unless you have an explicit requirement to tune them:

Learning rate (lower means longer to train but more accurate, higher means smaller to train but less accurate)
Number of boosting iterations (automatically tuned with early stopping and learning rate)
Maximum number of bins (RAM dependent)
pjankiewicz commented 5 years ago

Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion http://lamda.nju.edu.cn/yuy/GetFile.aspx?File=papers/aaai19_huyq.pdf