Closed yzwu2017 closed 2 years ago
I can try to submit a PR for this, if it's still open.
Also @yzwu2017 RandGauss
here is a standard normal I'm assuming? And, a link to the kaldi
source code file referenced here would be helpful as well.
Update: It looks like RandGauss
here corresponds to the following, so I believe it should be a standard normal.
inline float RandGauss(struct RandomState* state = NULL) {
return static_cast<float>(sqrtf (-2 * Log(RandUniform(state)))
* cosf(2*M_PI*RandUniform(state)));
}
Also for context, it looks like the previously reference Dither
code is from here:
It looks like the distribution is skewed due to torch.max(epsilon, ...)
.
How about using torch.randn
directly?
I see the bug now, nice catch @yzwu2017!
I would agree that using torch.randn
would be a good solution - it should give similar results to kaldi and is also easier to understand. I can submit a PR for this.
As a side note: I don't think the distribution would be skewed, though. In particular, I think the max(epsilon, ...)
was put there to avoid the possibility of generating a uniform variate that is exactly $0$, which will cause issues afterward when the logarithm is taken. Of course, this is irrelevant if we use np.randn
instead of the Box Muller transform, which makes the former a good solution too, I think.
🐛 Describe the bug
for the function "torchaudio.compliance.kaldi.fbank", there is an option "dither". The function calls _get_window() function, where dither leads to adding random number in strided_input: Since "x.log()" and "x" here are based on the same random number, it makes "rand_gauss" not gaussian distributed:
However, in the kaldi source code, the random number is gaussian distributed.
Versions
v0.12.1 (stable release)