Open HonLZL opened 2 months ago
Excellent report, thanks very much!
Could you try installing pandas
and inspecting the results of gbm.trees_to_dataframe()
, to see if maybe the later trees are very shallow?
That could be one reason for lower GPU utilization... the split-finding part of training can benefit from parallelization, but there's a sync-up after each search where the model has to be updated. I wonder if maybe in the later iterations, LightGBM is training much shallower trees (and therefore spending proportionally more time in those non-parallelized code paths).
num_leaves=255
does not guarantee that every tree with have 255 leaves.
LightGBM will stop growing a particular tree under a few conditions:
>= min_gain_to_split
min_data_in_leaf
or min_sum_hessian_in_leaf
interaction_constraints
or monotone_constraints
max_depth
Unrelated, some notes on those parameters:
# this is the default, you can omit this
"tree_learner": "serial"
# these are only relevant for the CLI, omit them when using the Python package
"task": "train"
"is_training_metric": "false"
Could you try installing
pandas
and inspecting the results ofgbm.trees_to_dataframe()
, to see if maybe the later trees are very shallow?
Glad to receive your reply! I ran 5,000 rounds using cuda, and this is part of the selection.
,tree_index,node_depth,node_index,left_child,right_child,parent_index,split_feature,split_gain,threshold,decision_type,missing_direction,missing_type,value,weight,count
2475998,4864,16,4864-L121,,,4864-S194,,,,,,,0.006992752334214716,149.0,149
2476998,4866,16,4866-S101,4866-L98,4866-L102,4866-S100,Column_6,1.705680012702942,-0.2550096362829208,<=,left,None,-0.00819252,531.0,531
2477998,4868,13,4868-L171,,,4868-S175,,,,,,,-0.011479730841377055,134.0,134
2478998,4870,12,4870-S18,4870-S48,4870-L19,4870-S17,Column_21,0.6902909874916077,0.9225429296493531,<=,left,None,-0.00471234,529.0,529
2479998,4872,11,4872-S171,4872-S172,4872-S176,4872-S170,Column_14,1.0827800035476685,0.16720353066921237,<=,left,None,-0.0045852,1004.0,1004
2480998,4874,21,4874-L71,,,4874-S75,,,,,,,0.010258754315588665,368.0,368
2481998,4876,11,4876-L47,,,4876-S46,,,,,,,0.0013771386017987898,479.0,479
2482998,4878,7,4878-S24,4878-L13,4878-S25,4878-S23,Column_26,1.3878200054168701,0.758811503648758,<=,left,None,0.00464105,648.0,648
2483998,4880,22,4880-S212,4880-L209,4880-L213,4880-S208,Column_25,1.2470799684524536,0.9166806042194368,<=,left,None,-0.00344305,506.0,506
2484998,4882,12,4882-S212,4882-S213,4882-S219,4882-S206,Column_9,1.0044300556182861,0.9594475924968721,<=,left,None,0.00119075,6424.0,6424
2485998,4884,14,4884-L216,,,4884-S222,,,,,,,0.003906279219997473,252.0,252
2486998,4886,11,4886-S128,4886-S129,4886-S131,4886-S127,Column_1,1.3676400184631348,-1.3494878411293028,<=,left,None,0.0037433,1885.0,1885
2487998,4888,6,4888-S106,4888-L2,4888-L107,4888-S105,Column_10,1.7062599658966064,-2.23167073726654,<=,left,None,-0.00077607,7684.0,7684
2488998,4889,13,4889-S50,4889-L50,4889-L51,4889-S49,Column_19,1.262369990348816,0.22485540062189105,<=,left,None,-0.00118126,378.0,378
2489998,4891,14,4891-S231,4891-S232,4891-L232,4891-S216,Column_27,0.9564549922943115,0.9873551428318025,<=,left,None,-0.00330718,810.0,810
2490998,4893,13,4893-L237,,,4893-S236,,,,,,,0.0020112752702087164,250.0,250
2491998,4895,13,4895-S56,4895-S57,4895-L57,4895-S55,Column_3,1.305359959602356,0.5652261972427369,<=,left,None,0.00609067,383.0,383
2492998,4897,13,4897-S218,4897-L218,4897-S226,4897-S217,Column_9,1.62663996219635,1.2169202566146853,<=,left,None,-0.00209578,11215.0,11215
2493998,4899,19,4899-L229,,,4899-S228,,,,,,,-0.014312340053603859,111.0,111
2494998,4901,14,4901-S125,4901-L117,4901-S134,4901-S119,Column_0,1.1668599843978882,0.5672366917133332,<=,left,None,0.00105788,609.0,609
2495998,4903,15,4903-S144,4903-L132,4903-L145,4903-S131,Column_3,0.6262369751930237,1.4257535934448244,<=,left,None,9.80367e-05,296.0,296
2496998,4905,11,4905-S50,4905-L50,4905-L51,4905-S49,Column_18,1.2204300165176392,0.6936975121498109,<=,left,None,0.00784372,407.0,407
2497998,4907,10,4907-L72,,,4907-S72,,,,,,,-0.00047845793306461624,16472.0,16472
2498998,4909,13,4909-S64,4909-S126,4909-L65,4909-S63,Column_26,1.1328099966049194,1.1948313713073733,<=,left,None,-0.00659668,485.0,485
2499998,4911,15,4911-S252,4911-L185,4911-S253,4911-S251,Column_0,1.3360899686813354,0.5170921981334687,<=,left,None,-0.00218027,956.0,956
2500998,4913,11,4913-S149,4913-L149,4913-S241,4913-S148,Column_4,1.2409199476242065,-0.9571563005447387,<=,left,None,-0.00313243,2587.0,2587
2501998,4915,17,4915-L183,,,4915-S182,,,,,,,-0.01039329694198946,104.0,104
2502998,4917,14,4917-L218,,,4917-S217,,,,,,,0.0029182628467818027,134.0,134
2503998,4919,20,4919-L131,,,4919-S130,,,,,,,0.003046603372175777,267.0,267
2504998,4921,17,4921-S125,4921-S126,4921-S129,4921-S84,Column_23,0.8258450031280518,0.9891601204872132,<=,left,None,-0.00102175,1806.0,1806
2505998,4923,10,4923-S40,4923-S41,4923-S42,4923-S39,Column_14,1.3672300577163696,0.1262423396110535,<=,left,None,0.00479201,770.0,770
2506998,4925,16,4925-L144,,,4925-S143,,,,,,,-0.01314376931544688,158.0,158
2507998,4927,14,4927-L149,,,4927-S148,,,,,,,-0.012864420435356875,108.0,108
2508998,4929,19,4929-L141,,,4929-S149,,,,,,,-0.01445356372371316,100.0,100
2509998,4931,16,4931-S127,4931-L127,4931-L128,4931-S126,Column_8,1.6426000595092773,1.6298071146011355,<=,left,None,-0.00694278,274.0,274
2510998,4933,18,4933-S242,4933-L140,4933-L243,4933-S174,Column_10,1.2257100343704224,0.6037689745426179,<=,left,None,-0.00464622,682.0,682
2511998,4935,15,4935-L71,,,4935-S70,,,,,,,-0.006048560484989801,110.0,110
2512998,4937,14,4937-S190,4937-S191,4937-L191,4937-S189,Column_9,1.0775500535964966,0.9248241186141969,<=,left,None,0.00346585,343.0,343
2513998,4939,10,4939-L14,,,4939-S13,,,,,,,0.008209223070969949,104.0,104
2514998,4941,11,4941-S12,4941-S15,4941-S13,4941-S9,Column_2,0.9452850222587585,0.3386045694351197,<=,left,None,0.00152447,2355.0,2355
2515998,4943,8,4943-S147,4943-S150,4943-S148,4943-S132,Column_7,0.6244350075721741,1.3059919476509096,<=,left,None,-0.000941328,5570.0,5570
2516998,4944,14,4944-L53,,,4944-S53,,,,,,,-0.010817443513866131,153.0,153
2517998,4946,12,4946-L120,,,4946-S119,,,,,,,-0.00032419846042009684,106351.0,106351
2518998,4948,16,4948-L224,,,4948-S224,,,,,,,-0.012146889258367129,117.0,117
2519998,4950,13,4950-S172,4950-S173,4950-S174,4950-S168,Column_13,0.7691389918327332,0.8014479875564576,<=,left,None,-0.00386439,729.0,729
2520998,4952,16,4952-S237,4952-L237,4952-L238,4952-S236,Column_13,0.8237029910087585,1.211699426174164,<=,left,None,-0.00339503,301.0,301
2521998,4954,18,4954-S170,4954-L168,4954-L171,4954-S167,Column_19,0.9019380211830139,-1.0000000180025095e-35,<=,left,None,0.00725262,244.0,244
2522998,4956,13,4956-L102,,,4956-S101,,,,,,,0.004279216547811973,1058.0,1058
2523998,4958,18,4958-L86,,,4958-S95,,,,,,,-0.009275815569726684,116.0,116
2524998,4960,17,4960-L245,,,4960-S251,,,,,,,0.0021101277049967123,188.0,188
2525998,4962,19,4962-L77,,,4962-S76,,,,,,,-0.015216302921784657,106.0,106
2526998,4964,14,4964-S190,4964-L178,4964-S191,4964-S177,Column_24,0.970412015914917,0.6755685508251191,<=,left,None,-0.00333743,620.0,620
2527998,4966,14,4966-L73,,,4966-S76,,,,,,,0.011329781752841099,137.0,137
2528998,4968,14,4968-S32,4968-S174,4968-S33,4968-S29,Column_19,1.5902700424194336,-0.6742075979709624,<=,left,None,0.000892957,5444.0,5444
2529998,4970,11,4970-L200,,,4970-S199,,,,,,,0.006320520037044324,336.0,336
2530998,4972,13,4972-S225,4972-L131,4972-L226,4972-S133,Column_15,0.7520939707756042,-0.3910080790519714,<=,left,None,-0.00766963,295.0,295
2531998,4974,15,4974-L139,,,4974-S143,,,,,,,0.009719555713627415,102.0,102
2532998,4976,13,4976-L131,,,4976-S130,,,,,,,-0.008448866840260916,195.0,195
2533998,4978,15,4978-L239,,,4978-S238,,,,,,,0.01019938246213964,112.0,112
2534998,4980,9,4980-L42,,,4980-S42,,,,,,,0.012193971863459975,126.0,126
2535998,4982,15,4982-L34,,,4982-S33,,,,,,,-0.009220713326855428,192.0,192
2536998,4984,9,4984-S26,4984-L25,4984-L27,4984-S25,Column_1,1.1161600351333618,-0.04436973668634891,<=,left,None,-0.00566565,339.0,339
2537998,4986,13,4986-L236,,,4986-S236,,,,,,,-0.010996394506930211,221.0,221
2538998,4988,18,4988-L222,,,4988-S233,,,,,,,-0.001222929739662567,1392.0,1392
2539998,4990,13,4990-L167,,,4990-S166,,,,,,,0.0017196273834156187,19678.0,19678
2540998,4992,15,4992-L47,,,4992-S46,,,,,,,0.019410366345196963,100.0,100
2541998,4994,11,4994-S136,4994-L109,4994-L137,4994-S118,Column_8,0.7672929763793945,1.0000000180025095e-35,<=,left,None,-0.00867073,459.0,459
2542998,4996,14,4996-S138,4996-L112,4996-S139,4996-S111,Column_23,0.5022619962692261,0.986467868089676,<=,left,None,0.000172803,445.0,445
2543998,4998,9,4998-S76,4998-S81,4998-S77,4998-S75,Column_11,1.4823499917984009,-0.8421601653099059,<=,left,None,8.01628e-06,93124.0,93124 2544998,4999,11,4999-L179,,,4999-S178,,,,,,,-0.00022046581974725736,1660.0,1660
When using device=cuda training, as the number of training iterations increases, GPU utilization decreases, and the time spent on each training iterations increases. However, device=CPU is not like this.
Unrelated, some notes on those parameters:
# this is the default, you can omit this "tree_learner": "serial" # these are only relevant for the CLI, omit them when using the Python package "task": "train" "is_training_metric": "false"
Thank you very much for your suggestion! Looking forward to your reply! Thanks!
Sorry, my request was unclear.
I'm not looking for a random sample of that dataframe. Could you use that output to see if there is a difference in the number of leaves in each tree?
A finding like "the trees in later iterations have fewer leaves" would be very informative here.
I'm looking for output similar to this:
tree 0: 255 leaves
...
tree 100: 75 leaves
...
tree 200: 25 leaves
...
tree 300: 3 leaves
...
tree 400: 3 leaves
...
tree 499: 3 leaves
Sorry, my request was unclear.
I'm not looking for a random sample of that dataframe. Could you use that output to see if there is a difference in the number of leaves in each tree?
A finding like "the trees in later iterations have fewer leaves" would be very informative here.
I'm looking for output similar to this:
tree 0: 255 leaves ... tree 100: 75 leaves ... tree 200: 25 leaves ... tree 300: 3 leaves ... tree 400: 3 leaves ... tree 499: 3 leaves
Hi, sorry for my later reply. I used "cat model.txt | grep -A 1 Tree=" to check the save_model. I got
Tree=4975
num_leaves=255
--
Tree=4976
num_leaves=255
--
Tree=4977
num_leaves=255
--
Tree=4978
num_leaves=255
--
Tree=4979
num_leaves=255
--
Tree=4980
num_leaves=255
--
Tree=4981
num_leaves=255
--
Tree=4982
num_leaves=255
--
Tree=4983
num_leaves=255
--
Tree=4984
num_leaves=255
--
Tree=4985
num_leaves=255
--
Tree=4986
num_leaves=255
--
Tree=4987
num_leaves=255
--
Tree=4988
num_leaves=255
--
Tree=4989
num_leaves=255
--
Tree=4990
num_leaves=255
--
Tree=4991
num_leaves=255
--
Tree=4992
num_leaves=255
--
Tree=4993
num_leaves=255
--
Tree=4994
num_leaves=255
--
Tree=4995
num_leaves=255
--
Tree=4996
num_leaves=255
--
Tree=4997
num_leaves=255
--
Tree=4998
num_leaves=255
--
Tree=4999
num_leaves=255
In fact, every tree has num_leaves=255.
hmmmm ok thank you for that!
Sorry, but I'm out of ideas. I'm not that familiar with the performance characteristics of the CUDA build here. I hope @shiyu1994 will be able to help.
I wanted test speed in python. I tried to replicate gpu(L40) and cpu(28cores) experiment with higgs. The following are the experimental results.
num_iterations(500): cuda(28s) version was slower than cpu(71s). num_iterations(5000): cuda(570s) version was slower than cpu(403s).
Within ten minutes, Volatile GPU-Util gradually decreased from 80% to within 10%
dataset and parameter settings from: https://github.com/microsoft/LightGBM/blob/master/docs/GPU-Tutorial.rst Dataset Preparation from https://github.com/guolinke/boosting_tree_benchmarks/blob/master/data/higgs2libsvm.py
Code:
cpu test
gpu test