microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.61k stars 3.83k forks source link

[LightGBM] [Fatal] Input format error when parsing as LibSVM #6242

Closed yydai closed 9 months ago

yydai commented 10 months ago

I what to train a pairwise ranking model, and I set the params like:

{ "objective": "lambdarank", "boosting_type": "gbdt", "metric": "auc", "learning_rate": 0.05, "max_depth": 7, "num_iterations": 200, "max_bin": 40, "weight":87, "query_id":88 # group column }

and the training dataset where the 88 index is group column

1 21:0.3363971 22:0.0084746 23:0.3006773 24:0.3817781 25:0.4602587 26:541 27:2637 28:6936 29:10312 30:64 31:233 32:627 33:920 34:3 35:12 36:40 37:58 38:0.3943149 39:0.4668909 40:0.4806986 41:0.4710396 42:0.5203252 43:0.6005155 44:0.6822633 45:0.7060629 46:0.0454545 47:0.9230769 48:0.8333333 49:0.8787879 50:66 51:242 52:326 53:331 54:7 55:20 56:23 57:24 58:0 59:2 60:2 61:2 62:0.0481050 63:0.0428470 64:0.0225934 65:0.0151197 66:0.0569106 67:0.0515464 68:0.0250272 69:0.0184190 70:0 71:0.1538462 72:0.0416667 73:0.0303030 74:0.0188679 75:0.0242634 76:0.0276532 77:0.0285405 78:0 79:0.0011554 80:0.0018685 81:0.0018064 82:2.7697274 83:0 84:0 85:0 86:23 87:1 88:0001c944-92e4-4022-9838-0f17101af3ca_1701088880252

I follow the lgb doc about group column, but getting an ERROR: [LightGBM] [Fatal] Input format error when parsing as LibSVM.

Did I do something wrong? Does this group_column have to be a number? Thank you!

shiyu1994 commented 10 months ago

Thanks for using LightGBM.

Note that column 88 contains non-numerical value 0001c944-92e4-4022-9838-0f17101af3ca_1701088880252 which is currently not automatically handled by LightGBM.

Please consider encoding the string values into discrete categorical integers.

yydai commented 9 months ago

After changing it, it can run normally. Thank you!