tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
655 stars 106 forks source link

Strange behavior on covtype dataset #78

Closed omit-ai closed 1 year ago

omit-ai commented 2 years ago

Hey all,

while fiddling around with tfdf we found a strange behavior of GradientBoostedTree Model on sklearns covtype dataset.

Setup

Environment

TFDF was running von GoogleColab and Multipass.

Multipass:

Hyperparameter

The GradientBoostedTree model was run with following parameter:


dt_kwargs_base = {
    'num_trees':100,
    'growing_strategy':"BEST_FIRST_GLOBAL",
    'max_depth':6,
    'use_hessian_gain':True,
    'sorting_strategy':"IN_NODE",
    'shrinkage':1.,
    'subsample':1.,
    'sampling_method': 'RANDOM',
    'l1_regularization':1.,
    'l2_regularization':1.,
    'l2_categorical_regularization':1.,
    'num_candidate_attributes': -1,
    'num_candidate_attributes_ratio': -1.,
    'min_examples':1,
    'validation_ratio':0.,
    'early_stopping':"NONE",
    'in_split_min_examples_check':False,
    'max_num_nodes': -1,
    'verbose': 0,
}

Dataset

Sklearn - Cov_Type Dataset

Splitted with sklearns train_test_split with test_size=0.2 and random_state=42. We decremented y by 1, so that the value range was right.

Result of predictions of test set

After training completed without an issue we got this predictions from test set:

[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]

The trees produced did not look suspicious.

This did not happen when we:

  1. Lowered num_trees (i.e. to 50)
  2. Lowered tree depth (i.e. to 3)
  3. Set use_hessian_gain to false
  4. Lowering the shrinkage (i.e. 0.5 or smaller)

So far we have not looked further into it. Maybe you have an idea, why this happened?

Thanks a lot in advance

Best Regards

Timo

achoum commented 2 years ago

Hi,

It is most likely a numerical accumulation problem. Thanks for the report, I'll take a look at it.

In the mean time, some remarks regarding the hyper-parameters: