huggingface / speechbox

Apache License 2.0
342 stars 33 forks source link

Puctuation restoration from trascript and wav file #30

Open mirix opened 11 months ago

mirix commented 11 months ago

Hello,

Is it possible to use the punctuation restoration function on a pre-existing transcript and a wav audio file?

Is so, how?

Best,

Ed

mirix commented 11 months ago

The following gives an assertion error decoding the transcript.

  File "/home/emoman/Downloads/nemo/lib/python3.8/site-packages/speechbox/restore.py", line 79, in __call__
    assert (
AssertionError: Decoding of 
from speechbox import PunctuationRestorer
import librosa
import whisper

device = 'cuda'
model_size = 'large-v2'
file_path = 'wav_file.wav'

modelw =  whisper.load_model(model_size, device=device)
modelw.to(device)

### Transcription ###

result = modelw.transcribe(file_path, beam_size=5, word_timestamps=True)

### Sentence splitting ### 

word_list = []
for segment in result['segments']:
    for word in segment['words']:
        word_list.append(word['word'])

full_text = ''.join([str(i) for i in word_list])

### Punctuation ###

audio_data, sample_rate = librosa.load(file_path)

restorer = PunctuationRestorer.from_pretrained('openai/whisper-large-v2')
restorer.to(device)

restored_text, log_probs = restorer(audio_data, full_text, sampling_rate=sample_rate, num_beams=5)

print('Restored text:\n', restored_text)