microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.64k stars 3.83k forks source link

categorical feature not be identified #345

Closed tak-wah closed 7 years ago

tak-wah commented 7 years ago

I use my own dataset including the categorical feature, return the ValueError as follow: ValueError: could not convert string to float: 'a' the categorical feature in the first column, i set parameters : _categoricalfeature=0

The format of the partial data: 3 a -1.047 0.537 1.186 ... 1 b -0.151 -0.221 -0.090 ... ... ... ... ... ... ... 1 a 0.387 -1.660 0.684 ...

wxchan commented 7 years ago

isn't it second column?

tak-wah commented 7 years ago

The first column is the label. I use the .conf format version and try to add label=0, but get the same wrong:

[LightGBM] [Info] Using column number 0 as label
Met Exceptions:
Unknown token a in data file

I try to use python version,get the tips as follow:

Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.6/site-packages/lightgbm-0.1-py3.6.egg/lightgbm/engine.py", line 163, in train
    booster = Booster(params=params, train_set=train_set)
  File "/root/anaconda3/lib/python3.6/site-packages/lightgbm-0.1-py3.6.egg/lightgbm/basic.py", line 1189, in __init__
    train_set.construct().handle,
  File "/root/anaconda3/lib/python3.6/site-packages/lightgbm-0.1-py3.6.egg/lightgbm/basic.py", line 787, in construct
    categorical_feature=self.categorical_feature, params=self.params)
  File "/root/anaconda3/lib/python3.6/site-packages/lightgbm-0.1-py3.6.egg/lightgbm/basic.py", line 652, in _lazy_init
    self.__init_from_np2d(data, params_str, ref_dataset)
  File "/root/anaconda3/lib/python3.6/site-packages/lightgbm-0.1-py3.6.egg/lightgbm/basic.py", line 699, in __init_from_np2d
    data = np.array(mat.reshape(mat.size), dtype=np.float32)
ValueError: could not convert string to float: 'a'
guolinke commented 7 years ago

@tak-wah It seems you use file as input data. currently, the non-numerical categorical features only be supported by Pandas in python-package. If you need to pass categorical features by file, you should convert them to int first.

tak-wah commented 7 years ago

@guolinke Thank you! According to your tips, I convert [a, b, ...] to [0, 1, ...] and categorical_feature=0, then it work well.

Another trouble that the multiclass accuracy rate with low when i use /examples/multiclass_classification/multiclass.* data sets.

Early stopping, best iteration is:
[60]    training's multi_logloss: 1.21632   training's multi_error: 0.229   valid_1's multi_logloss: 1.41303    valid_1's multi_error: 0.542

Is it data sets reason?

wxchan commented 7 years ago

The dataset is randomly generated.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.