microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.74k stars 3.84k forks source link

include init_score in predict method #1978

Closed spedygiorgio closed 5 years ago

spedygiorgio commented 5 years ago

Actually it seems not possible to seamlessy include the init_score in prediction. It would be nice the predict method to handle an init_score, if given. E.g. I am boosting a model with an apriori predictiction:

  1. using a general model to calculate base predictions in raw scale `

    calculating initial raw score

    base_fraud_raw= lgb_general_model.predict(X, raw_score=True) `

  2. then i (re)create train and test sets `

    recreating train and test sets

    X_train, X_tmp, y_train, y_tmp, rw_train, rw_tmp = train_test_split(X, y,base_raw_score, test_size=0.3, stratify=y) X_valid, X_test, y_valid, y_test, rw_valid, rw_test = train_test_split(X_tmp, y_tmp,rw_tmp, test_size=0.5, stratify=y_tmp) del X_tmp,y_tmp,rw_tmp `

  3. then I create the lgb Datasets and tune the model `

    defining lgb Dataframes(s)

    lgb_full_categorical_predictors=binarized_predictors_generic+categorical_predictors_generic+binarized_predictors_30+categorical_predictors_30 lgb_train_30 = lgb.Dataset(data=X_train.values, label=y_train.values, feature_name=X_train.columns.tolist(),categorical_feature=lgb_full_categorical_predictors, free_raw_data=False,init_score = rw_train) lgb_valid_30 = lgb.Dataset(data=X_valid.values, label=y_valid.values, reference=lgb_train,feature_name=X_valid.columns.tolist(),categorical_feature=lgb_full_categorical_predictors, free_raw_data=False,init_score = rw_valid) lgb_full_30= lgb.Dataset(data=X.values, label=y.values, reference=lgb_train,feature_name=X.columns.tolist(),categorical_feature=lgb_full_categorical_predictors, free_raw_data=False,init_score = base_raw_score)

    tune the models

    lgb_model_30= lgb.train(params=lgb_general_params,train_set=lgb_train_30,valid_sets=[lgb_valid_30],early_stopping_rounds=10) `

  4. then I calculate predictions (using the raw scores)

`

function to get probabilities from raw

def softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x) out = e_x / (1+ e_x) return out

predict using raw score

raw_temp=lgb_model_30.predict(X_test, raw_score=True)+rw_test proba = softmax(raw_temp)

caculating performance

my_roc_auc_score= roc_auc_score(y_test,proba) `

If init_score could have been prodived as supplementary parameter to lgb_model_30.predict method, I wold have avoided the need to know the right transformation (what is in gamma regression, in poison one in box cox ones,...) and to performa the calculation in the probability scale manually.

Is it possible to integrate init_score in the predict method.

This issue is related to #1778 and #1969

alkodsi commented 5 years ago

How do you integrate the init_score with prediction in regression (gamma, l1, l2) for now?

StrikerRUS commented 5 years ago

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.