kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.03k stars 5.3k forks source link

GOP and LPR score #4852

Open wangkenpu opened 1 year ago

wangkenpu commented 1 year ago

Recently, I find 2 parts are different from the original Hu's GOP paper.

  1. LPR computation

    // LPR(p_j|p_i)=\log p(p_j|\mathbf o; t_s, t_e)-\log p(p_i|\mathbf o; t_s, t_e)
    for (int k = 0; k < phone_num; k++)
      phone_level_feat(1 + phone_num + k) = lpp_part(phone_id) - lpp_part(k);

    Per my understanding, $p_i$ should be the canonical phoneme, $LPR(p_j|p_i) = \log p(p_j|\mathbf o; t_s, t_e) - \log p(p_i|\mathbf o; t_s, t_e)$, and phoneme level feature is defined as ${[LPP(p_1),\cdots,LPP(p_M), LPR(p_1|p_i), \cdots, LPR(p_j|p_i),\cdots]}^T$. So I think the above code should be changed as:

    // LPR(p_j|p_i)=\log p(p_j|\mathbf o; t_s, t_e)-\log p(p_i|\mathbf o; t_s, t_e)
    for (int k = 0; k < phone_num; k++)
      phone_level_feat(1 + phone_num + k) = pp_part(k) - lpp_part(phone_id);
  2. Formulation to compute the GOP score in the document

    In the document,

    $$GOP(p)=\log \frac{LPP(p)}{\max_{q\in Q} LPP(q)}$$

    $$LPP(p)=\log p(p|\mathbf o; t_s,t_e)$$

    In Hu's paper

    $$GOP(p)=\log \frac{p(p|\mathbf o; t_s,te)}{\max{q\in Q} p(q|\mathbf o; t_s,t_e)}$$

    Thus, I think the GOP formulation in the document should be changed to

    $$GOP(p)=\frac{LPP(p)}{\max_{q\in Q} LPP(q)}$$

stale[bot] commented 1 year ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

wangkenpu commented 1 year ago

ping

jimbozhang commented 12 months ago

Sorry for the delayed reply. @wangkenpu

Regarding the LPR computation, I agree with you but I believe it should not significantly impact the result.

Regarding the document, I appreciate you pointed out the redundancy \log in the equation. It should be nice if you could submit a pull request to address this issue.