Open nalbion opened 7 years ago
I've changed my window size to 8ms and removed the "+1" mentioned above, but now when FFT returns the first element always a 0.0 imaginary component, and as a result findMaxMagnitude()
finds a huge value at index 0 and votes it as the top result - so the frequency is always 0 and my VAD never detects any speech
Open issues here https://github.com/goxr3plus/java-google-speech-api
As per the recommendations of Moattar and Homayounpour I'm trying to detect voice activity using a 10ms sliding window.
For 10ms of 16kHz 16bit mono audio,
getNumBytes(.01)
returns 320. (it would be 320.5, but it is stored in an int)...why add the .5?
then
getFrequency()
callsbytesToDoubleArray()
, passing the 320 bytes. Another point of confusion is the calculation of the size ofmicBufferData
:with 2 bytesPerSample, the code has allocated space for 319 doubles, but when it's done everything after
bytesPerSample[159]
is 0.0back in
getFrequency()
I end up with an array of 319Complex
values, but again, everything after159
is0.0, 0.0
In
FFT()
you check:...At first I thought "that's not checking if it is a power of 2", but then you call it recursively, this would eventually be a valid test. As it happens, the excheption is thrown the first time through because I've got 160 values in an array with capacity for 319.