Open ElysiumFan086 opened 4 years ago
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
hi @ElysiumFan086 I wonder if this is similar to this issue (based on a quick search): https://github.com/microsoft/LightGBM/issues/2953 you can also try to post this to the lightgbm forum. Is the dataset private? Can you create a small repro with either a toy dataset or the one you are using (if it is not private)? That may be the best way to debug this.
see some other similar issues: https://github.com/microsoft/LightGBM/issues/2597 another possibly related post: https://github.com/microsoft/LightGBM/issues/2239
Have you tried the latest code from master? I just updated to the latest LightGBM on master recently. This issue may have already been fixed.
@imatiach-msft Thank you for your advice, and due to privacy policy it is inconvenient to open the dataset here. But I have check the dataset, and It seems that for items, in amount of queries, have same ranking labels, which may result from my labeling strategy, and I will check it carefully. For example, ranking label in Query may be:
Q1: 4, 4, 4, 4, 4, 0, 0
Q2: 2, 2, 2, 2, 2, 2, 4
Q3: 1, 1, 1, 1, 4, 2
But still I have several problems to trouble you:
min_data_in_leaf
parameter for LightGBMRanker;I hope these will not bother you much, and thank you again!
@ElysiumFan086 1.) That parameter should already be available on latest master: https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMParams.scala#L357 2.) I'm guessing that either the native code might be different, or the distributed data partitioning code causes the logic to be slightly different to cause the issue - either way the fix will probably be in the LightGBM codebase 3.) I think this is latest version: Maven Coordinates
com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1-86-05c25aad-SNAPSHOT
Maven Resolver
https://mmlspark.azureedge.net/maven
This comes from this PR build: https://github.com/Azure/mmlspark/pull/866/checks?check_run_id=715637908
Thanks~~ @imatiach-msft I have tried the latest version, but still face the same problem. Currently, we have no idea about this issue, and what we can do is training on local machine, which just takes me more time.
Hello, Any update on this? TY
This is RongFan. I have received you mail just now, and I will read it soon.Thank you!
@imatiach-msft I have encountered some trouble when training Lambda rank model in Spark with LightGBMRanker in mmlspark.
With the same training data, training results in Spark and local machine are different:
In Spark with mmlspark, training log is shown as below:
In my local machine, where LightGBM is installed with conda, training work finished successfully.
I am not sure what makes above difference. I have tried to modify some parameters like
maxPosition
and partition number in spark, which had proved to be of no use. And also settingmin_data_in_leaf
smaller was also proposed, but I didn't find any access to set this parameter in mmlspark interface of LightGBMRanker.If anyone had experience of solving similar problem, it will be very appreciate of you to give me some advice.
Following lists some information, which may be helpful to diagnose what happened:
Tree=0 num_leaves=1 num_cat=0 split_feature= split_gain= threshold= decision_type= left_child= right_child= leaf_value=0 leaf_count=0 internal_value= internal_count= shrinkage=1
end of trees
feature importances:
parameters: [boosting: gbdt] [objective: lambdarank] [metric: lambdarank] [tree_learner: serial] [device_type: cpu] [data: ] [valid: ] [num_iterations: 500] [learning_rate: 0.1] [num_leaves: 1023] [num_threads: 0] [max_depth: 10] [min_data_in_leaf: 20] [min_sum_hessian_in_leaf: 0.001] [bagging_fraction: 1] [bagging_freq: 0] [bagging_seed: 3] [feature_fraction: 1] [feature_fraction_seed: 2] [early_stopping_round: 0] [max_delta_step: 0] [lambda_l1: 0.01] [lambda_l2: 0.01] [min_gain_to_split: 0] [drop_rate: 0.1] [max_drop: 50] [skip_drop: 0.5] [xgboost_dart_mode: 0] [uniform_drop: 0] [drop_seed: 4] [top_rate: 0.2] [other_rate: 0.1] [min_data_per_group: 100] [max_cat_threshold: 32] [cat_l2: 10] [cat_smooth: 10] [max_cat_to_onehot: 4] [top_k: 20] [monotone_constraints: ] [feature_contri: ] [forcedsplits_filename: ] [refit_decay_rate: 0.9] [cegb_tradeoff: 1] [cegb_penalty_split: 0] [cegb_penalty_feature_lazy: ] [cegb_penalty_feature_coupled: ] [verbosity: 1] [max_bin: 255] [min_data_in_bin: 3] [bin_construct_sample_cnt: 200000] [histogram_pool_size: -1] [data_random_seed: 1] [output_model: LightGBM_model.txt] [snapshot_freq: -1] [input_model: ] [output_result: LightGBM_predict_result.txt] [initscore_filename: ] [valid_data_initscores: ] [pre_partition: 1] [enable_bundle: 1] [max_conflict_rate: 0] [is_enable_sparse: 1] [sparse_threshold: 0.8] [use_missing: 1] [zero_as_missing: 0] [two_round: 0] [save_binary: 0] [header: 0] [label_column: ] [weight_column: ] [group_column: ] [ignore_column: ] [categorical_feature: ] [predict_raw_score: 0] [predict_leaf_index: 0] [predict_contrib: 0] [num_iteration_predict: -1] [pred_early_stop: 0] [pred_early_stop_freq: 10] [pred_early_stop_margin: 10] [convert_model_language: ] [convert_model: gbdt_prediction.cpp] [num_class: 1] [is_unbalance: 0] [scale_pos_weight: 1] [sigmoid: 1] [boost_from_average: 1] [reg_sqrt: 0] [alpha: 0.9] [fair_c: 1] [poisson_max_delta_step: 0.7] [tweedie_variance_power: 1.5] [max_position: 3] [label_gain: ] [metric_freq: 1] [is_provide_training_metric: 0] [eval_at: ] [num_machines: 1] [local_listen_port: 12400] [time_out: 120] [machine_list_filename: ] [machines: ] [gpu_platform_id: -1] [gpu_device_id: -1] [gpu_use_dp: 0]
end of parameters