In our project we need to work with pre-loaded audio chunks and I did a small PR that adds file_io flag to the whisper s2t model. This mode allows to call transcribe() with np.arrays (without working with file io). Usage example:
model = whisper_s2t.load_model(
model_identifier=./models/faster-whisper-large-v3",
backend='CTranslate2',
n_mels=128,
file_io=False
)
# some audio chunks
audio_chunks = [np.frombuffer(my_data, np.int16).flatten().astype(np.float32)/32768.0]
result = model.transcribe(audio_chunks,
lang_codes=lang_codes,
tasks=tasks,
initial_prompts=initial_prompts,
batch_size=32
)
Please let me know if I will need to change tests or benchmarks as well in order to to merge the PR.
Hello Whisper S2T team!
In our project we need to work with pre-loaded audio chunks and I did a small PR that adds
file_io
flag to the whisper s2t model. This mode allows to calltranscribe()
with np.arrays (without working with file io). Usage example:Please let me know if I will need to change tests or benchmarks as well in order to to merge the PR.
P.S. There is a ticket https://github.com/shashikg/WhisperS2T/issues/25 and this PR can be a first step for it. (if we control external VAD and hypothesis buffer outside of whisper s2t).
Best regards, Andrei