The whisper models require a 16k sample rate, but not many audio devices provide that sample rate. Mine, for example, only supports 44100 and 192000. Leaving the sample rate at 16000 in src/config.json results in an error:
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
_StreamBase.__init__(self, kind='input', wrap_callback='array',
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
_check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]
Changing it to 44100, on the other hand, results in:
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
_StreamBase.__init__(self, kind='input', wrap_callback='array',
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
_check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]
Here is the output of arecord -Dhw:0 --dump-hw-params:
Warning: Some sources (like microphones) may produce inaudible results
with 8-bit sampling. Use '-f' argument to increase resolution
e.g. '-f S16_LE'.
HW Params of device "hw:0":
--------------------
ACCESS: MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT: S16_LE S32_LE
SUBFORMAT: STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 11888617)
PERIOD_SIZE: [16 524288]
PERIOD_BYTES: [128 2097152]
PERIODS: [2 32]
BUFFER_TIME: (166 23777234)
BUFFER_SIZE: [32 1048576]
BUFFER_BYTES: [128 4194304]
TICK_TIME: ALL
--------------------
arecord: set_params:1371: Sample format non available
Available formats:
- S16_LE
- S32_LE
The whisper models require a 16k sample rate, but not many audio devices provide that sample rate. Mine, for example, only supports 44100 and 192000. Leaving the sample rate at 16000 in
src/config.json
results in an error:Changing it to 44100, on the other hand, results in:
Here is the output of
arecord -Dhw:0 --dump-hw-params
:Does this need some abstraction like
sox
?