stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 215 forks source link

RuntimeWarning: overflow encountered in square/exp #352

Open shawn-padstats opened 5 months ago

shawn-padstats commented 5 months ago

I am optimizing hyperparameters using optuna and getting this error:

C:\Users\shawn\anaconda3\lib\site-packages\ngboost\distns\normal.py:70: RuntimeWarning: overflow encountered in exp self.scale = np.exp(params[1]) C:\Users\shawn\anaconda3\lib\site-packages\ngboost\distns\normal.py:71: RuntimeWarning: overflow encountered in square self.var = self.scale**2

Here is my hyperparameter objective function:

def _ngb_objective(self, trial):

NGBoost-specific parameters

    ngb_params = {
        'n_estimators': trial.suggest_int('n_estimators', 10, 1000),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-4, 5e-2),  # Log scale to avoid extreme values
        'minibatch_frac': trial.suggest_float('minibatch_frac', 0.5, 1.0),
        'natural_gradient': trial.suggest_categorical('natural_gradient', [True, False]),
        'verbose': False
    }

    # Score type
    score_type = trial.suggest_categorical('score_type', ['CRPScore', 'LogScore'])
    if score_type == 'CRPScore':
        ngb_params['Score'] = CRPScore
    elif score_type == 'LogScore':
        ngb_params['Score'] = LogScore

    # Base learner parameters for the DecisionTreeRegressor
    base_learner_params = {
        'criterion': trial.suggest_categorical('criterion', ['squared_error', 'friedman_mse', 'absolute_error']),
        'splitter': trial.suggest_categorical('splitter', ['best', 'random']),
        'max_depth': trial.suggest_int('max_depth', 2, 24),  # Moderate depth to avoid overfitting and numerical instability
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 20),
        'min_weight_fraction_leaf': trial.suggest_float('min_weight_fraction_leaf', 0.0, 0.5),
        'max_features': trial.suggest_categorical('max_features', [None, 'sqrt', 'log2']),
        'max_leaf_nodes': trial.suggest_int('max_leaf_nodes', 8, 2048, log=True),  # Log scale for moderated growth
        'min_impurity_decrease': trial.suggest_float('min_impurity_decrease', 0.0, 0.01)  # Small range to limit extreme splits
    }

    # Distribution choice handled outside of base learner parameters to focus on numerical stability adjustments
    distribution_choice = trial.suggest_categorical('distribution', ['Normal', 'LogNormal'])
    if distribution_choice == 'Normal':
        Dist = Normal
    elif distribution_choice == 'LogNormal':
        Dist = LogNormal

    # Combine NGBoost parameters with the base learner
    ngb_params['Base'] = DecisionTreeRegressor(**base_learner_params)

    # Instantiate and train the NGBRegressor with the selected distribution
    model = NGBRegressor(Dist=Dist, **ngb_params)
fif911 commented 5 months ago

Hey. I got the same error. Have you solved it? Do you get this error even after scaling your data or you don't scale?

shawn-padstats commented 5 months ago

Hey. I got the same error. Have you solved it? Do you get this error even after scaling your data or you don't scale?

I have not been able to figure it out. I just skip the trial if I reach that error.

I am not scaling features so that could possibly be a reason, but I didn't think scaling was necessary for this model type.

KlimisStilp commented 3 months ago

I have the same error as well....

All features are scaled and I am tuning my model with BayesGridSearchCV. Distribution assumed to be Normal and score is set to CRPScore. Try to tune learning rate, n_estimators and max depth of the DecisionTrees but I get these warnings.

The only way that I don't get these errors is if I put validation set in fit method and put early_stopping_rounds but this is not supposed to be the solution right ?