tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
663 stars 110 forks source link

multi-class classification with TFDF: map true labels to probabilities vector of model.predict() #117

Closed miroC911 closed 2 years ago

miroC911 commented 2 years ago

Hello, Sorry for this basic question. I am wondering how to map true (original) class labels (strings) from model.predict (test) probability scores. I am retrieving top class index via argmax, and I want to map this index to the original label. Basically I want to get the label string for that top class index...but I am not sure how TFDF models sort the labels to get that info or if I can access them in any way... I am passing a pandas df to RandomForestModel() via tfdf.keras.pd_dataframe_to_tf_dataset(df, label). Thanks Mireille

achoum commented 2 years ago

Background

Keras assumes that classification labels are integers in the range [0, num_classes). To make your life easier, tfdf.keras.pd_dataframe_to_tf_dataset converts string labels into integers. The exact conversion is done here.

The label indices are computed after having sorted lexicographically the string label representation.

Your case

To make sure you have the correct matching between label string and integer representations, it is best that you do the conversion manually before calling pd_dataframe_to_tf_dataset. This is the option used in the beginner colab.