Open wangkenpu opened 1 year ago
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.
ping
Sorry for the delayed reply. @wangkenpu
Regarding the LPR computation, I agree with you but I believe it should not significantly impact the result.
Regarding the document, I appreciate you pointed out the redundancy \log in the equation. It should be nice if you could submit a pull request to address this issue.
Recently, I find 2 parts are different from the original Hu's GOP paper.
LPR computation
Per my understanding, $p_i$ should be the canonical phoneme, $LPR(p_j|p_i) = \log p(p_j|\mathbf o; t_s, t_e) - \log p(p_i|\mathbf o; t_s, t_e)$, and phoneme level feature is defined as ${[LPP(p_1),\cdots,LPP(p_M), LPR(p_1|p_i), \cdots, LPR(p_j|p_i),\cdots]}^T$. So I think the above code should be changed as:
Formulation to compute the GOP score in the document
In the document,
$$GOP(p)=\log \frac{LPP(p)}{\max_{q\in Q} LPP(q)}$$
$$LPP(p)=\log p(p|\mathbf o; t_s,t_e)$$
In Hu's paper
$$GOP(p)=\log \frac{p(p|\mathbf o; t_s,te)}{\max{q\in Q} p(q|\mathbf o; t_s,t_e)}$$
Thus, I think the GOP formulation in the document should be changed to
$$GOP(p)=\frac{LPP(p)}{\max_{q\in Q} LPP(q)}$$