tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
663 stars 110 forks source link

Models trained on pure 1's predict 0 #188

Closed NikolajSafty closed 1 year ago

NikolajSafty commented 1 year ago

Hi,

I'm currently facing an issue with models being trained on data where the label is always the same value. I'd expect the fitted model to produce a model the same value as in the training dataset, but I'm only getting 0's when predicting.

Here's an example to demonstrate:

import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow_decision_forests as tfdf

# Generate dummy data
data = {
    'fruit': ['apple'] * 100 + ['banana'] * 100,
    'eatable': [1]*200
}

# Read and split 
df = pd.DataFrame.from_dict(data)
train, test = train_test_split(df, random_state=0)
tf_train = tfdf.keras.pd_dataframe_to_tf_dataset(train, label="eatable")
tf_test = tfdf.keras.pd_dataframe_to_tf_dataset(test, label="eatable")

# Instantiate af model and fit 
model = tfdf.keras.CartModel()
model.fit(tf_train)

# Count 1's from prediction
model.predict(tf_test).sum()

The results are pure 0's. I'd expect the model to predict pure 1's, though.

rstz commented 1 year ago

Hi, thank you for reporting this. This is a bug and we're actively working on fixing it in the next days.

NikolajSafty commented 1 year ago

Thanks for the quick feedback, sounds good!

rstz commented 1 year ago

Confirming that this is now fixed at head and the fix will be included in the next version