Closed veddandekar closed 3 years ago
The crf model outputs are a little complex. When training
is True, crf outputs a potentials
, whose shape is [batch_size, seq_len, num_tags]. But when training
is False, crf outputs a decoded_sequence
, whose shape is [batch_size, seq_len], but I encode the decoded_sequence
using one-hot
encoding to keep the shape consistent. However, the groud truth label in training data does not need to do one-hot
encoding.
You need to apply argmax
to the decoded sequence in prediction mode to get the predicted sequence.
Firstly, thank you for this implementation of CRF. It has helped a lot with my project! I am a little confused as to whether during training, the model expects one-hot encoded labels or does it directly output class labels.
Specifically, each of my input is an array of 9 elements. The output labels I am passing to the model are label encoded (either 0 or 1), one for each of the input element of the array.
eg: input = [23, 43, 34, 67, 34, 76, 65, 234, 124 ] labels = [ 1, 0, 0, 0, 1, 0, 1, 0, 1 ]
Here, the labels are not one-hot encoded, meaning label[i] corresponds to input[i].
X shape = (x, 9) y_true shape = (x, 9)
Training completes with the above successfully. However, during prediction, my outputs seem to be one-hot encoded, ie y_pred shape = (x, 9, 2)
Is taking encoded labels as y_true during training but predicting one-hot encoded labels the expected behaviour or have I misunderstood something?