I would like to work directly on the audio buffer for further preprocessing, but I do not understand how can I get details like sampling rate and bit depth. After speech being detected, I intend to run further computations such as segmentation and neural network predictions, but I would need these kind of details. Also, from the example, the file format RAW is used, can I still treat the buffer with the audio samples as if it was from a normal .wav file or do I need to do any kind of conversion?
apologies for the hugely delayed response here, but this library uses 16-bit, single-channel audio. the sample rate is 16000 unless otherwise specified in the constructor.
I would like to work directly on the audio buffer for further preprocessing, but I do not understand how can I get details like sampling rate and bit depth. After speech being detected, I intend to run further computations such as segmentation and neural network predictions, but I would need these kind of details. Also, from the example, the file format RAW is used, can I still treat the buffer with the audio samples as if it was from a normal .wav file or do I need to do any kind of conversion?