google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
498 stars 53 forks source link

RandomForestLearner has no attributes/methods .task() .predict() .evaluate() #143

Closed sermomon closed 2 weeks ago

sermomon commented 2 weeks ago

Some codes provided in the documentation do not work correctly when using Random Forest (RandomForestsLearner) with another dataset. For example:

#%% Read and repare data:

# Read data
train_data = pd.read_csv(TRAIN_PATH)
test_data = pd.read_csv(TEST_PATH)

#%% Train Random Forest:

model = ydf.RandomForestLearner(
   label="class",
   task=ydf.Task.CLASSIFICATION,
   num_trees=N_TREES,
   max_depth=MAX_DEPTH,
   compute_oob_performances=True)

model.train(train_data)
print(model.describe())

This is where it starts to go wrong:

assert model.task() == ydf.Task.CLASSIFICATION

**assert model.task() == ydf.Task.CLASSIFICATION Traceback (most recent call last):

Cell In[3], line 1 assert model.task() == ydf.Task.CLASSIFICATION

AttributeError: 'RandomForestLearner' object has no attribute 'task'**

evaluation = model.evaluate(test_data)

**evaluation = model.evaluate(test_data) Traceback (most recent call last):

Cell In[4], line 1 evaluation = model.evaluate(test_data)

AttributeError: 'RandomForestLearner' object has no attribute 'evaluate'**

model.predict(test_data)

model.predict(test_data) Traceback (most recent call last):

Cell In[6], line 1 model.predict(test_data)

AttributeError: 'RandomForestLearner' object has no attribute 'predict'

rstz commented 2 weeks ago

YDF does not follow the Sklearn / Tensorflow API style, which seems to be assumed in the code you provided: Learner and Model are different objects. The correct code is


#%% Read and repare data:

# Read data
train_data = pd.read_csv(TRAIN_PATH)
test_data = pd.read_csv(TEST_PATH)

# Train    
x_train = train_data.drop(columns=['class'])
y_train =  train_data['class'].astype('category')

# Test
x_test = test_data.drop(columns=['class'])
y_test = test_data['class'].astype('category')

#%% Train Random Forest:

# Change 1: ydf.RandomForestLearner returns a learner
learner = ydf.RandomForestLearner(
   label="class",
   task=ydf.Task.CLASSIFICATION,
   num_trees=N_TREES,
   max_depth=MAX_DEPTH,
   compute_oob_performances=True)

# Change 2: ydf.RandomForestLearner returns a model (which is different from the learner)
model = learner.train(train_data)
print(model.describe())
``
rstz commented 2 weeks ago

Is the code you posted from the documentation? If so, please tell us where, so it can be fixed.

sermomon commented 2 weeks ago

Sorry. It was my misinterpretation of the RandomForestLearner object.

Thank you very much for your quick reply @Rstz. If I find any improvement I'll let you know. Sorry if I got ahead of myself...

I close the issue.

sermomon commented 2 weeks ago

Sorry. It was my misinterpretation of the RandomForestLearner object.

Thank you very much for your quick reply @Rstz. If I find any improvement I'll let you know. Sorry if I got ahead of myself...

I close this issue.