microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.49k stars 4.29k forks source link

Strange Edit Distance Error Value #2707

Open BenjaminTrapani opened 6 years ago

BenjaminTrapani commented 6 years ago

The following code is used to obtain the CTC loss and edit distance using the C++ API:

auto trainingOp = ForwardBackward(LabelsToGraph(labelInput), modelFn, blankTokenID, -1);
editDistance = EditDistanceError(modelFn, labelInput, 1.0f, 1.0f, 1.0f, true, {blankTokenID});

I have verified that labelInput is formatted correctly and matches the training data. It is of shape [257 x 8 x 4] where 257 is the number of classes (256 + 1 for blank label), 8 is the sequence length and there are 4 batches. Exactly 1 value on the first axis is 1 (the index indicates the class) and the rest are 0. blankTokenID=256. modelFn is a linear projection from an OptimizedRNNStack without activation, and has the same shape as labelInput. The values for loss, edit distance and decoded values for the 1000th training batch are below:

sample count: 32000
aggregate loss: 177271
aggregate metric: 0 //Should be > 0 since edit distance is not zero as illustrated
Labels after reconcile: Value([257 x 8 x 4], GPU)
CPU label y dim: 4
CPU label x dim: 2056
Predicted labels: Value([257 x 8 x 4], GPU)
Actl: 67 97 114 100 97 109 111 110
Pred: 235 235 235 235 235 235 235 235 //Should be 67 97 114 100 97 109 111 110
Actl: 77 69 65 83 85 82 69 68
Pred: 235 235 235 235 235 235 235 235
Actl: 68 79 79 68 76 69 82 83
Pred: 235 235 235 235 235 235 235 235
Actl: 70 97 99 101 116 256 256 256
Pred: 235 235 235 235 235 235 235 235

The above decoding is performed manually on the CPU by argmaxing over each row in the resulting sequence. The metrics are obtained from the ProgressWriter OnWriteTrainingSummary function, invoked every 1000 batches. It seems like the edit distance should be non-zero given the decoded values above (the decoded labels do not match the expected training labels). Am I formatting the inputs to EditDistanceError incorrectly? Additionally the model converges to yielding only the blank character, although that is likely an issue with the architecture itself and not the implementation.

BenjaminTrapani commented 6 years ago

Edit distance error looks correct as of 3dc66d0304bad2406ce9019279452f3ed77e6efd and after switching from Ubuntu to Windows 10. This was tested on a version of CNTK that is modified slightly from the commit above to work with CUDA 9 and CuDNN 7 (just a small update to the RNN CUDA API call).

BenjaminTrapani commented 6 years ago

It looks like the one-hot-encoded text labels should be 2: https://github.com/Microsoft/CNTK/blob/master/Source/SequenceTrainingLib/gammacalculation.h#L333

After encoding all labels as 2 instead of 1, the loss increases to 10^14 or so and the optimizer fails to train the model. Does encoding the text labels as phone boundaries make sense considering the current CTC implementation?