kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
13.95k stars 5.29k forks source link

Same continuous phonemes are aggregated when computing gop features via compute-gop #4919

Open a2d8a4v opened 3 weeks ago

a2d8a4v commented 3 weeks ago

[Problem Statement] In computer-assisted pronunciation training, we use time-aligned information to compute the pronunciation features such as goodness of pronunciation (GOP). We want each phoneme to be separately processed to obtain their features or scores. However, in the original implementation of compute-gop:L163, it used phoneme transition to decide if it is the next phoneme or not to recompute the phoneme duration, which encounters the problem that if some word is composed of some continuous duplicated phonemes, for example:

SUDDENNESS S AH1 D AH0 N N AH0 S

it finally makes an outcome for a single N.

[Solution] Add the phoneme boundary information to solve such a case.

danpovey commented 3 weeks ago

@jimbozhang perhaps you'd like to comment? If you don't object, I'm inclined to just merge this when it's marked done (i.e. draft label removed)

jimbozhang commented 3 weeks ago

@a2d8a4v Thanks for fixing the issue. I think it is reasonable. Could you please ensure the modified recipe has been thoroughly tested, as I won't have time to do so myself?

a2d8a4v commented 3 weeks ago

Hi, @jimbozhang

I have already tested the updated code with two corpora: speechocean762 and L2-ARCTIC.

The speechocean762 corpus do not have the problem of duplicated continuous phonemes. The test for this corpus is to check if this influences the original results from here in terms of pure phone sequences and their lengths.

This phenomenon exists in the L2-ARCTIC corpus. For example, in the utterance with the identity number 'arctic_a0086'.

jimbozhang commented 2 weeks ago

Hi @a2d8a4v,

Could you do me a favor? I'd like to remove the Google Docs link from the top of egs/gop_speechocean762/README.md, but I don't want to create a separate pull request just for this. If it's not too much trouble, could you include this change in your current PR?

--- a/egs/gop_speechocean762/README.md
+++ b/egs/gop_speechocean762/README.md
@@ -1,8 +1,3 @@
-There is a copy of this document on Google Docs, which renders the equations better:
-[link](https://docs.google.com/document/d/1pie-PU6u2NZZC_FzocBGGm6mpfBJMiCft9UoG0uA1kA/edit?usp=sharing)
-
-* * *
-
 # GOP on Kaldi

 The Goodness of Pronunciation (GOP) is a variation of the posterior probability, for phone level pronunciation scoring.

Thanks alot.

a2d8a4v commented 2 weeks ago

Hi, @jimbozhang, I've dealt with it.

a2d8a4v commented 4 days ago

Dear @jimbozhang and @csukuangfj,

Do you have any other suggestions about the code? Alternatively, do you think we should proceed by having @danpovey confirm the pull request?