Closed mainguyenanhvu closed 3 years ago
Hi mainguyenanhvu,
Could you share the full stack trace as well as a print of the first examples in the dataset. If your dataset contains private information, make sure to mask them. This would help us to understand the issue :).
In the meantime, I reproduced the situation you are describing as follow:
pd_train = pd.DataFrame({"f":[0,1,2,3,4,5,6,7,8,9,10,11],
"l":[0,1,2,3,0,1,2,3,0,1,2,3]})
pd_test = pd.DataFrame({"f":[0,1,2,3,4,5,6,7,8,9,10,11],
"l":[0,1,2,3,0,1,2,3,0,1,2,3]})
ds_train = tfdf.keras.pd_dataframe_to_tf_dataset(pd_train, label="l")
ds_test = tfdf.keras.pd_dataframe_to_tf_dataset(pd_test, label="l")
model = tfdf.keras.RandomForestModel(num_trees=5)
model.fit(ds_train)
model.evaluate(ds_test)
pred = model.predict(ds_test)
print(pred)
This code works as expected. Note that the prediction (final print) is of shape [n, 4] (like in your case). This is what we want: A probability for each of the 4 classes.
I checked on google colab. It worked well. However, in my computer, my code still returns error despite checking my variables and dataset many times. I will try to fix it myself.
Thank you very much for your support.
I do not understand the reason. However, I remove metrics when compiling model, it works.
I re-open this issue because I reproduced the error on google colab after I compiled model with metrics.
Please check this notebook.
Make sure you are using Keras metrics that are compatible with the model.
For example, in your example, you are calling keras.metrics.Accuracy
. According to the documentation this implementation of accuracy (there are many e.g. Accuracy, BinaryAccuracy, CategoricalAccuracy, SparseCategoricalAccuracy) "calculates how often predictions equal labels.". However, your model returns per-classes probabilities and your evaluation label are represented as integers. Therefore, SparseCategoricalAccuracy will be more suited.
The following code should work:
metrics = [keras.metrics.SparseCategoricalAccuracy(name='accuracy')]
model.compile(metrics=metrics)
model.evaluate(ds_test)
The same has to be done for all the metrics :)
Thank you very much.
Dear authors,
I used tfdf.pd_dataframe_to_tf_dataset for train and test set respectively after making sure that both train and test had all 4 classes (single label for each data point).
I found that labels in two sets were integer encoded (
[0 1 2 3]
). I defined:It raised error:
ValueError: Shapes (None, 4) and (None, 1) are incompatible
Then I move to this code:It raised error:
ValueError: Shapes (None, 4) and (None, 1) are incompatible
Then, I checked:Output:
Please help me to fix this error. Thank you so much.