Open a2d8a4v opened 3 weeks ago
@jimbozhang perhaps you'd like to comment? If you don't object, I'm inclined to just merge this when it's marked done (i.e. draft label removed)
@a2d8a4v Thanks for fixing the issue. I think it is reasonable. Could you please ensure the modified recipe has been thoroughly tested, as I won't have time to do so myself?
Hi, @jimbozhang
I have already tested the updated code with two corpora: speechocean762 and L2-ARCTIC.
The speechocean762 corpus do not have the problem of duplicated continuous phonemes. The test for this corpus is to check if this influences the original results from here in terms of pure phone sequences and their lengths.
This phenomenon exists in the L2-ARCTIC corpus. For example, in the utterance with the identity number 'arctic_a0086'.
Hi @a2d8a4v,
Could you do me a favor? I'd like to remove the Google Docs link from the top of egs/gop_speechocean762/README.md
, but I don't want to create a separate pull request just for this. If it's not too much trouble, could you include this change in your current PR?
--- a/egs/gop_speechocean762/README.md
+++ b/egs/gop_speechocean762/README.md
@@ -1,8 +1,3 @@
-There is a copy of this document on Google Docs, which renders the equations better:
-[link](https://docs.google.com/document/d/1pie-PU6u2NZZC_FzocBGGm6mpfBJMiCft9UoG0uA1kA/edit?usp=sharing)
-
-* * *
-
# GOP on Kaldi
The Goodness of Pronunciation (GOP) is a variation of the posterior probability, for phone level pronunciation scoring.
Thanks alot.
Hi, @jimbozhang, I've dealt with it.
Dear @jimbozhang and @csukuangfj,
Do you have any other suggestions about the code? Alternatively, do you think we should proceed by having @danpovey confirm the pull request?
[Problem Statement] In computer-assisted pronunciation training, we use time-aligned information to compute the pronunciation features such as goodness of pronunciation (GOP). We want each phoneme to be separately processed to obtain their features or scores. However, in the original implementation of compute-gop:L163, it used phoneme transition to decide if it is the next phoneme or not to recompute the phoneme duration, which encounters the problem that if some word is composed of some continuous duplicated phonemes, for example:
SUDDENNESS S AH1 D AH0 N N AH0 S
it finally makes an outcome for a single N.
[Solution] Add the phoneme boundary information to solve such a case.