yzyouzhang / AIR-ASVspoof

Official implementation of the SPL paper "One-class Learning Towards Synthetic Voice Spoofing Detection"
MIT License
110 stars 31 forks source link

Dear #26

Closed JJun-Guo closed 2 years ago

JJun-Guo commented 2 years ago

I have a question. How can I differentiate between real and fake samples by score from loss.py (OCSoftmax), The spoof sample has a high score nearly 1 and bonafide sample has a low score nearly -1, what is the threshold to distinguish them, is it "0"? It seems to me that the output of a binary classification model is a probability,such as [0.1,0.9], 0.1---> bonafide sample, 0.9---> spoof sample。But how can i do like this output with your trained model?

yzyouzhang commented 2 years ago

Our output is the negative cosine similarity between the embedding and the weight vector. You are right that the spoof has scores close to 1 and bona fide close to -1. In the evaluation process, we do not use thresholds because we are calculating EER. But in practice, I think 0 is a good threshold.

yzyouzhang commented 2 years ago

If you would like to view my output as probability, please check out the derivation below: $$ \begin{aligned} {L}\textit{OCS} &=\frac{1}{N} \sum{i=1}^{N} \log \left(1+e^{\alpha\left(m{y{i}}-\hat{\bm w}^T \hat{\bm x}{i}\right)(-1)^{y{i}}}\right)\ &=\frac{1}{N} \left(\sum{|{\Omega}|} \log \left(1+e^{\alpha\left(m{0}-\hat{\bm w}^T \hat{\bm x}{i}\right)}\right)+\sum{|\overline{\Omega}|} \log \left(1+e^{\alpha\left(\hat{\bm w}^T \hat{\bm x}{i}-m{1}\right)}\right)\right) \ &=-\frac{1}{N}\left(\sum{|{\Omega}|} \log \frac{1}{1+e^{\alpha\left(m{0}-\hat{\bm w}^T \hat{\bm x}{i}\right)}}+\sum{|\overline{\Omega}|}\log \frac{1}{1+e^{\alpha\left(\hat{\bm w}^T \hat{\bm x}{i}-m{1}\right)}}\right) \ \end{aligned} $$

JJun-Guo commented 2 years ago

thank u,it means that the output score and threshold are irrelevant!?,when i put the output of the model into nn.softmax to predict the sample, the result is poor, such as (0.45,0.55) et.al ,i can not understand perfect performance on negative cosine similarity perdition, but poor performance on nn.softmax.

Junjun Guo @.*** 发自 网易邮箱大师

---- 回复的原邮件 ---- 发件人 You @.>日期 2022年05月17日 22:54收件人 @.>抄送至 @.**@.>主题 Re: [yzyouzhang/AIR-ASVspoof] Dear (Issue #26)

Our output is the negative cosine similarity between the embedding and the weight vector. You are right that the spoof has scores close to 1 and bona fide close to -1. In the evaluation process, we do not use thresholds because we are calculating EER. But in practice, I think 0 is a good threshold. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yzyouzhang commented 2 years ago

Could you please elaborate on how you get the result of (0.45,0.55) and how you interpret it?

JJun-Guo commented 2 years ago

feats, lfcc_outputs = self.model(lfcc) out_score = F.softmax(lfcc_outputs) #[:, 0]

a sample outscore is [0.45,0.55], 0.45---->the probability of belonging to bonafide; 0.55---->the probability of belonging to spoof; the probability of all samples are similar. and what is the meaning of element in derivation below: $$ \begin{aligned} {L}\textit{OCS} &=\frac{1}{N} \sum{i=1}^{N} \log \left(1+e^{\alpha\left(m{y{i}}-\hat{\bm w}^T \hat{\bm x}{i}\right)(-1)^{y{i}}}\right)\ &=\frac{1}{N} \left(\sum{|{\Omega}|} \log \left(1+e^{\alpha\left(m{0}-\hat{\bm w}^T \hat{\bm x}{i}\right)}\right)+\sum{|\overline{\Omega}|} \log \left(1+e^{\alpha\left(\hat{\bm w}^T \hat{\bm x}{i}-m{1}\right)}\right)\right) \ &=-\frac{1}{N}\left(\sum{|{\Omega}|} \log \frac{1}{1+e^{\alpha\left(m_{0}-\hat{\bm w}^T \hat{\bm x}{i}\right)}}+\sum{|\overline{\Omega}|}\log \frac{1}{1+e^{\alpha\left(\hat{\bm w}^T \hat{\bm x}{i}-m{1}\right)}}\right) \ \end{aligned} $$

yzyouzhang commented 2 years ago

If you are using OC-Softmax, the output score is the cosine similarity, you do not need to feed it into Softmax. I still do not understand the [0.45, 0.55] and why the prob of all samples are similar. Can you tell me the output score of the model?

yzyouzhang commented 2 years ago
image image
JJun-Guo commented 2 years ago

您好!输入softmax的数据不是ocsoftmax输出的分数,而是lfcc_outputs,如下所示。 feats, lfcc_outputs = self.model(lfcc) out_score = F.softmax(lfcc_outputs) out_score就是softmax输出的概率,例如(0.45,0.55)。我看了所有的样本输出,第一个概率大概都是0.4多,第二个概率都是0.5多。逻辑上讲,对于一个spoof样本,第一个概率应该在0左右,第二个概率在1左右才合理吧!我不太清楚问题出在哪了。

Junjun Guo @.*** 发自 网易邮箱大师

---- 回复的原邮件 ---- 发件人 You @.>日期 2022年05月18日 11:50收件人 @.>抄送至 @.**@.>主题 Re: [yzyouzhang/AIR-ASVspoof] Dear (Issue #26)

If you are using OC-Softmax, the output score is the cosine similarity, you do not need to feed it into Softmax. I still do not understand the [0.45, 0.55] and why the prob of all samples are similar. Can you tell me the output score of the model? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yzyouzhang commented 2 years ago

The model trained with OC-Softmax should use OC-Softmax for scoring. Please refer to our test.py for the correct way of scoring.

JJun-Guo commented 2 years ago

feats, lfcc_outputs = self.model(lfcc) out_score = F.softmax(lfcc_outputs) a sample outscore is [0.45,0.55], 0.45---->the probability of belonging to bonafide; 0.55---->the probability of belonging to spoof; and what is the meaning of element in derivation below: $$ \begin{aligned} {L}\textit{OCS} &=\frac{1}{N} \sum{i=1}^{N} \log \left(1+e^{\alpha\left(m{y{i}}-\hat{\bm w}^T \hat{\bm x}{i}\right)(-1)^{y{i}}}\right)\ &=\frac{1}{N} \left(\sum{|{\Omega}|} \log \left(1+e^{\alpha\left(m{0}-\hat{\bm w}^T \hat{\bm x}{i}\right)}\right)+\sum{|\overline{\Omega}|} \log \left(1+e^{\alpha\left(\hat{\bm w}^T \hat{\bm x}{i}-m{1}\right)}\right)\right) \ &=-\frac{1}{N}\left(\sum{|{\Omega}|} \log \frac{1}{1+e^{\alpha\left(m_{0}-\hat{\bm w}^T \hat{\bm x}{i}\right)}}+\sum{|\overline{\Omega}|}\log \frac{1}{1+e^{\alpha\left(\hat{\bm w}^T \hat{\bm x}{i}-m{1}\right)}}\right) \ \end{aligned} $$郭军军 @.***

签名由 网易邮箱大师 定制

On 5/18/2022 00:35,You @.***> wrote:

Could you please elaborate on how you get the result of (0.45,0.55) and how you interpret it? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yzyouzhang commented 2 years ago

Our model trained with OC-Softmax has a weight vector to represent the center of the bonafide cluster. During inference, the embedding of the new utterance is compared with this center vector to calculate cosine similarity. So the output is in the range of [-1, 1]. If you do want a probability as output, please refer to the derivation in the picture above.