compute_fbank 中 dither 是设置0还是1？

wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Apache License 2.0

630 stars 109 forks source link

Closed zuowanbushiwo closed 8 months ago

zuowanbushiwo commented 9 months ago

我看训练的时候设置的是dither: 1.0 dataset_args:

aug_prob: 0.6
fbank_args:
dither: 1.0
frame_length: 25
frame_shift: 10
num_mel_bins: 80

在infer_onnx.py 中又是dither: 0.0
在C++ onnxruntime 中设置的dither: 0.0
在这里 https://github.com/wenet-e2e/wespeaker/blob/a84bd7dd63758781af317f587323bb8542157d0a/runtime/server/x86_gpu/model_repo/feature_extractor/1/model.py#L69 又设置 dither: 1

这个变量对结果没什么影响吗？

dither的作用
dither为1，作用是在计算滤波器系数能量时加入随机扰动，防止能量为0的情况出现，会导致同一条音频的输出特征前后不一致。如果需要保持一致，要在配置文件中设置dither=0

谢谢！

JiJiJiang commented 9 months ago

你的观察得很细致哈训练的时候设置成1，一定程度上有增加数据多样性的作用，测试的时候设置成0即可，这样可以确保多次测试的结果是一致的，不过即使设置成1对fbank的影响也是微乎其微的，基本不会影响最终embedding的效果，所以不用太在意！

zuowanbushiwo commented 9 months ago

感谢您的回复

训练的时候设置成1，一定程度上有增加数据多样性的作用，测试的时候设置成0即可

不过即使设置成1对fbank的影响也是微乎其微的，基本不会影响最终embedding的效果

这个确实这样，我验证过。

谢谢！

wsstriving commented 8 months ago

Solved