Difference between n_estimators and number of boosting iterations M

stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction

Apache License 2.0

1.65k stars 218 forks source link

Hello,

Can someone please explain the difference between the parameter n_estimators and the number of Boosting Iterations (referred to in the paper as M) that you were trying to tune in the experiments here

In the documentation:

n_estimators      : the number of boosting iterations to fit

and with regards to M its defined in the paper as the number of boosting stages, basically, I am looking at the following line of code:

 # pick the best iteration on the validation set
  y_preds = ngb.staged_predict(X_val)
  y_forecasts = ngb.staged_pred_dist(X_val)

  val_rmse = [mean_squared_error(y_pred, y_val) for y_pred in y_preds]
  val_nll = [
         -y_forecast.logpdf(y_val.flatten()).mean() for y_forecast in y_forecasts
  ]
  best_itr = np.argmin(val_rmse) + 1

I assume the M in this case is the best_itr variable in the code above. If I am interested in applying kfold cross validation, is the best_itr a parameter that I can tune through kfold cross validation for example?

It's true that n_estimators in the code corresponds to M in the paper, as you have written.

Moreover it's also true that in our experiments we set M by picking the best val_nll on a validation set, and concretely this is assigned to the variable best_itr. This was designed to match the methodology of prior works. However, I want to make it clear that this methodology is just one possible option among many ways you can choose the hyper-parameter M in real-world applications. One alternative is to fix M to be a randomly chosen large value. Another alternative is, as you mentioned, K-fold cross validation. So the answer is yes, n_estimators can be chosen via K-fold cross validation, and it does not need to follow the code in this experiment.

Feel free to re-open the issue if anything is left unclear.

stanfordmlgroup / ngboost

Difference between n_estimators and number of boosting iterations M #140