I was looking at the audio processing code here and noticed that the behavior seems to deviate a little bit from the python implementation specifically in the following regards:
silence leading up to a recording (for both fixed length and silence-aware recording) should be discarding before starting any sort of "timer" (either the fixed length or until a particular amount of silence)
A certain about of silence should be kept at the beginning of the recording so that the audio doesn't seem "chopped" – this actually increases the accuracy of the speech to text
Just wondering if I was mistaken, if this was an oversight, or an intentional decision.
I was looking at the audio processing code here and noticed that the behavior seems to deviate a little bit from the python implementation specifically in the following regards:
Just wondering if I was mistaken, if this was an oversight, or an intentional decision.