google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
447 stars 49 forks source link

Cannot use 'discretize_numerical_columns' in tuner #75

Closed mocher72 closed 3 months ago

mocher72 commented 4 months ago

I am trying to add the 'discretize_nmerical_colums' to a tuner.choice

tuner.choice('discretize_numerical_columns',[False, True])

But i get the following error

ValueError: INVALID_ARGUMENT: Unknown param "discretize_numerical_columns".

This comes from:-

File ~/miniforge3/envs/StAnd/lib/python3.11/site-packages/ydf/learner/generic_learner.py:238, in GenericLearner._train_from_dataset(self, ds, valid) 232 log.info( 233 "Train model on %d examples", 234 train_ds.nrow(), 235 ) 237 time_begin_training_model = datetime.datetime.now() --> 238 cc_model = self._get_learner().Train(**train_args) 239 log.info( 240 "Model trained in %s", 241 datetime.datetime.now() - time_begin_training_model, 242 ) 244 return model_lib.load_cc_model(cc_model)

If i put it into the RandomForestLearner then it works ok.
Can I use this parameter in the tuner?

rstz commented 4 months ago

Hi, thanks for reporting this.

discretize_numerical_columns is, technically, a parameter of the way the dataset is read (i.e. it modifies the "dataspec"), not the model training. The tuner currently only handles model parameters, not dataset parameters. Therefore it's currently not possible to use this parameter in the tuner, sorry!