smart022 / articles

articles_backup
MIT License
2 stars 0 forks source link

All about tuning #10

Open smart022 opened 5 years ago

smart022 commented 5 years ago
smart022 commented 5 years ago

All about LGB params

掘金lgb参数解释 * 官方分类-核心/learn控制/IO--

  1. Core
    • task
    • objective
    • boosting : default=gbdt, == boosting_type, boost
    • num_iterations : default=100, == num_iteration, n_iter, num_trees, num_rounds, n_estimators
    • learning_rate : default=0.1, == eta, shrinkage_rate
    • num_leaves : default = 31, == num_leaf, max_leaves, max_leaf
    • num_threads : default = 0 recom = real cpu cores, == num_thread, nthread, nthreads, n_jobs
  2. Learning Control
    • max_depth: default = -1(no limit
    • min_data_in_leaf: default = 20, == min_data_per_leaf, min_data, min_child_samples
    • min_sum_hessian_in_leaf: default = 0.01, == min_hessian, min_child_weight
    • bgging_fraction: default = 1.0 , avail = (0,1] , ==sub_row, subsample, bagging
    • bagging_freq; default = 0 , type int k means bag at every k iters, == subsample_freq
    • feature_fraction: default = 1.0, == sub_feature, colsample_bytree
    • feature_fraction_seed
    • early_stopping_round: default = 0
    • max_delta_step
    • lambda_l1: default = 0 avail=[0,inf), == reg_alpha
    • lambda_l2: default = 0, avail=[0,inf), == reg_lambda
    • min_gain_to_split: default = 0.0 >=0, == min_split_gain
    • drop_rate
  3. IO
    • verbosity
    • max_bin: default = 255, type = int, avail=(1,inf)
    • min_data_in_bin
    • bin_construct_sample_cnt
    • histogram_pool_size
  4. Objective
  5. Metric
  6. For the Leaf-wise tree
    • num_leaves
    • min_data_in_leaf
    • max_depth
  7. For faster speed
    • bgging_fraction && bagging_freq
    • feature_fraction
    • max_bin: 用较小值
    • save_binary
  8. For Better Accuracy
    • max_bin: 用较大值(会慢
    • learning_rate: 用小值 配上 大的 num_iterations
    • num_leaves: 用大值(会overfit
    • 大数据
    • dart
  9. Deal with Over-fitting
    • max_bin: 用小值(而且快
    • num_leaves: 用小值
    • min_data_in_leaf && min_sum_hessian_in_leaf
    • bagging_fraction && bagging_freq
    • feature_fraction
    • lamda_l1/l2 && min_gain_to_split
    • max_depth

      总结重要的参数

    • max_bin: 小值 防过拟合且加速
    • num_leaves: <= 2^(max_depth)
    • max_depth: 一般 5 - 10
    • bagging_fraction && bagging_freq
    • feature_fraction
    • lamda_l1/l2 && min_gain_to_split
    • max_depth
smart022 commented 5 years ago

三招提升数据不平衡模型的性能(附python代码) 不平衡重要参数:

  1. is_unbalanced
  2. class_weight
smart022 commented 5 years ago

Model_Ensemble.py