sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.08k stars 697 forks source link

xgboost prediction failed for binary classifier #1881

Open Yancey1989 opened 4 years ago

Yancey1989 commented 4 years ago

Reproduced SQL:

SELECT * FROM fraud_detection_pred
TO TRAIN xgboost.gbtree
WITH
    objective="binary:logistic"
LABEL class
INTO yx_xgb_model;

SELECT * FROM fraud_detection_pred
TO PREDICT fraud_detection_pred_result.class
USING yx_xgb_model

logs:

...
sqlflow_submitter/db_writer/pai_maxcompute.py", line 37, in flush
    self.writer.write(self.rows, self.writer_indices)
  File "/usr/lib/python2.7/site-packages/paiio/python/python_io/table_writer.py", line 35, in write
    self._writer.write(values=values, col_indices=indices)
  File "/usr/lib64/python2.7/site-packages/common_io/table.py", line 182, in write
    list(tuple(v) for v in values), tuple(col_indices), allow_type_cast)
  File "/usr/lib64/python2.7/site-packages/common_io/table.py", line 204, in _write
    raise Exception(err)
Exception: Can't convert Python object because of type mismatching. [Convert to <type 'int'> for column 0]. Detailed message: invalid literal for long() with base 10: '0.000000'
Yancey1989 commented 4 years ago

After validating the first column type to INT, another error:

Exception: Can't convert Python object because of type mismatching. [Convert to <type 'int'> for column 30]. Detailed message: invalid literal for long() with base 10: '0.02244279'

I found the bug is:

https://github.com/sql-machine-learning/sqlflow/blob/afb91d4913dcf6416c0d22d9b146b3e9c57c6778/python/sqlflow_submitter/xgboost/predict.py#L47-L50

len(preds.shape) can not distinguish between binary classification and regression.

I will fix it ASAP.