yeyupiaoling / VoiceprintRecognition-Pytorch

This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the same time, this project also supports MelSpectrogram, Spectrogram data preprocessing methods
Apache License 2.0
777 stars 124 forks source link

2个算法很难区分的实例,但是人能容易区分? #21

Closed dragen1860 closed 2 years ago

dragen1860 commented 2 years ago

作者你好: 我购买了9.9yuan模型并测试了算法,发现效果挺好的。但是我遇到2个女声样本,测试后发现非常难以区分:

-----------  Configuration Arguments -----------
audio_path1: ../ysvoiceprint/voiceprint/zhou-tang/zhou3min-10-16k.wav
audio_path2: ../ysvoiceprint/voiceprint/zhou-tang/tang3min-10-16k.wav
input_shape: (1, 2000, 257)
model_path: models/resnet34.pth
threshold: 0.71
------------------------------------------------
slice: 1099 len: 257 whole: (257, 2001)
slice: 633 len: 257 whole: (257, 2001)
..//voiceprint/zhou-tang/zhou3min-10-16k.wav 和 ..//voiceprint/zhou-tang/tang3min-10-16k.wav 
不是同一个人,相似度为:0.612113

随机再取1.3s来比较:

-----------  Configuration Arguments -----------
audio_path1: ../ysvoiceprint/voiceprint/zhou-tang/zhou3min-10-16k.wav
audio_path2: ../ysvoiceprint/voiceprint/zhou-tang/tang3min-10-16k.wav
input_shape: (1, 320, 257)
model_path: models/resnet34.pth
threshold: 0.71
------------------------------------------------
slice: 381 len: 257 whole: (257, 2001)
slice: 1634 len: 257 whole: (257, 2001)
..//voiceprint/zhou-tang/zhou3min-10-16k.wav 和 ..//voiceprint/zhou-tang/tang3min-10-16k.wav 
不是同一个人,相似度为:0.685574

不管从这2个样本随机采样多少个1.3s, 发现相似度一直很高(0.6~0.7)左右。这是很奇怪的,这种困难的corner case有啥好的区分办法吗? 非常感谢。

附上这2个音频: 2samples-hard.zip

yeyupiaoling commented 2 years ago

我用新模型是可以的

成功加载模型参数和优化方法参数:models/ecapa_tdnn\model.pdparams
audio/tang3min-10-16k.wav 和 audio/zhou3min-10-16k.wav 不是同一个人,相似度为:0.4682544767856598
dragen1860 commented 2 years ago

@yeyupiaoling 嗯嗯,可以加个WX交流吗:dragen1860