Closed decmca closed 2 months ago
Hi, thank you for reporting. I wasn't immediately able to repro this on the adult dataset:
import ydf
import pandas as pd
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
dataset = pd.read_csv(f"{ds_path}/adult.csv")
learner = ydf.RandomForestLearner(label="age", task=ydf.Task.REGRESSION)
evaluation = learner.cross_validation(dataset, folds=5)
This trains a regression model and works fine in colab. Can you please provide a repro?
Hi - i think this is likely to be a Macbook issue. I tried to run your code and got the same error.
Will just use Colab, thanks.
Error:
ValueError Traceback (most recent call last) Cell In[22], line 7 4 dataset = pd.read_csv(f"{ds_path}/adult.csv") 6 learner = ydf.RandomForestLearner(label="age", task=ydf.Task.REGRESSION) ----> 7 evaluation = learner.cross_validation(dataset, folds=5)
File ~/miniconda3/envs/ydf/lib/python3.11/site-packages/ydf/learner/generic_learner.py:420, in GenericLearner.cross_validation(self, ds, folds, bootstrapping, parallel_evaluations) 417 learner = self._get_learner() 419 with log.cc_log_context(): --> 420 evaluation_proto = learner.Evaluate( 421 vertical_dataset._dataset, # pylint: disable=protected-access 422 fold_generator, 423 evaluation_options, 424 deployment_evaluation, 425 ) 426 return metric.Evaluation(evaluation_proto)
ValueError: INVALID_ARGUMENT: Classification requires a categorical label.
Thanks for getting back to me, a new Mac release is scheduled for this week, so hopefully this will fix it on Mac as well.
Cross validation does not support regression task. See code and error below. Same issue for Random Forest too. Removing the tuner does not make any difference.
tuner = ydf.RandomSearchTuner(num_trials=50, automatic_search_space=True)
learner = ydf.GradientBoostedTreesLearner(label="orders", task=ydf.Task.REGRESSION, tuner=tuner )
evaluation = learner.cross_validation(train_df2, folds=10)
[WARNING 24-07-02 18:33:57.6357 BST gradient_boosted_trees.cc:1840] "goss_alpha" set but "sampling_method" not equal to "GOSS". [WARNING 24-07-02 18:33:57.6357 BST gradient_boosted_trees.cc:1851] "goss_beta" set but "sampling_method" not equal to "GOSS". [WARNING 24-07-02 18:33:57.6357 BST gradient_boosted_trees.cc:1865] "selective_gradient_boosting_ratio" set but "sampling_method" not equal to "SELGB".
ValueError Traceback (most recent call last) Cell In[36], line 8 1 tuner = ydf.RandomSearchTuner(num_trials=50, automatic_search_space=True) 3 learner = ydf.GradientBoostedTreesLearner(label="orders", 4 task=ydf.Task.REGRESSION, 5 tuner=tuner 6 ) ----> 8 evaluation = learner.cross_validation(train_df2, folds=10)
File ~/miniconda3/envs/ydf/lib/python3.11/site-packages/ydf/learner/generic_learner.py:420, in GenericLearner.cross_validation(self, ds, folds, bootstrapping, parallel_evaluations) 417 learner = self._get_learner() 419 with log.cc_log_context(): --> 420 evaluation_proto = learner.Evaluate( 421 vertical_dataset._dataset, # pylint: disable=protected-access 422 fold_generator, 423 evaluation_options, 424 deployment_evaluation, 425 ) 426 return metric.Evaluation(evaluation_proto)
ValueError: INVALID_ARGUMENT: Classification requires a categorical label.