Dataset will only encode labels if labels are strings (now the input data can have already pre-encoded labels like integers)
You can pass your own label encoding map at init with parameter label_map for example label_map={"Acceptable Recommendation": 0, "Bad recommendation": 1}. Passing label_map is required for streaming datasets. For non-streaming datasets if label_map is not passed, it will be automatically generated based on unique label values in the dataset
For non-streaming dataset you can now also balance the dataset to have equal amount of labels for each unique label by setting balance_label_counts=True at init. This could benefit model training.
label_map
for examplelabel_map={"Acceptable Recommendation": 0, "Bad recommendation": 1}
. Passing label_map is required for streaming datasets. For non-streaming datasets if label_map is not passed, it will be automatically generated based on unique label values in the datasetbalance_label_counts=True
at init. This could benefit model training.