In certain conditions some CRF tags transitions can by missing after the data augmentation or can be "underrepresented".
We must ensure that all possible tags transitions are in the augmented dataset so that inference does not fail systematically on those examples
Example
Given a dataset with 1 intent and 3 slots: slot_1, slot_2, slot_3
If in the dataset only has 5% utterances with the following pattern: bla bla [slot_1] [slot_2] bla bla and slot_1 only has 5% of length 1 entity values and 95% of length 2 entities values. Then when augmenting the data the probability of getting a the pattern B-slot-1 B-slot-2 in your training data becomes 0,0025 and will probably missing from your training data.
If slot_1 has the value word_1 and slot_2 has the value word_2 word_3, if the CRF sees: "word_1 word_2 word_3" then it will tag it as "B-slot-1 I-slot-1 B-slot-2" instead of "B-slot-1 B-slot-2 I-slot-2" because it has never seen this transition in the training data.
Now let's say that unluckily people use 95% of the time the length 1 value of the slot 1 then it means that the CRF will systematically fail in 95%*5%=4.75% of the cases, which is pretty high
Potential solutions
Make sure that all possible tags transitions are in the augmented dataset
Boost the proportion of rare tags transitions (this might have a negative impact on performances since CRF transitions weights might be impacted :s)
Problem description
Short description
In certain conditions some CRF tags transitions can by missing after the data augmentation or can be "underrepresented". We must ensure that all possible tags transitions are in the augmented dataset so that inference does not fail systematically on those examples
Example
Given a dataset with 1 intent and 3 slots:
slot_1
,slot_2
,slot_3
If in the dataset only has 5% utterances with the following pattern:
bla bla [slot_1] [slot_2] bla bla
andslot_1
only has 5% of length 1 entity values and 95% of length 2 entities values. Then when augmenting the data the probability of getting a the patternB-slot-1 B-slot-2
in your training data becomes 0,0025 and will probably missing from your training data.If
slot_1
has the valueword_1
andslot_2
has the valueword_2 word_3
, if the CRF sees:"word_1 word_2 word_3"
then it will tag it as"B-slot-1 I-slot-1 B-slot-2"
instead of"B-slot-1 B-slot-2 I-slot-2"
because it has never seen this transition in the training data.Now let's say that unluckily people use 95% of the time the length 1 value of the
slot 1
then it means that the CRF will systematically fail in 95%*5%=4.75% of the cases, which is pretty highPotential solutions