shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
MIT License
312 stars 32 forks source link

Handle batch processing when few files fails in the whole batch #50

Open BBC-Esq opened 8 months ago

BBC-Esq commented 8 months ago

When my script batch processes a bunch of audio files using the approach you gave me to use a list of files and their settings when processing, if a single file fails for any reason, it prevents the transcriptions of all files' transcriptions from being done? I created a workaround to process each file to the transcribe_with_vad method (each using its own tqdm) and added error handling, which works. I was wondering if there's a way to make it so I can use your most efficient approach and still have error handling for a specific audio file? Here is the original script and a comparison with the single audio file processing with error handling:

import os
from PySide6.QtCore import QThread, Signal
from pathlib import Path
import whisper_s2t
import time

class Worker(QThread):
    finished = Signal(str)
    progress = Signal(str)

    def __init__(self, directory, recursive, output_format, device, size, quantization, beam_size, batch_size, task):
        super().__init__()
        self.directory = directory
        self.recursive = recursive
        self.output_format = output_format
        self.device = device
        self.size = size
        self.quantization = quantization
        self.beam_size = beam_size
        self.batch_size = batch_size
        self.task = task.lower()

    def run(self):
        directory_path = Path(self.directory)
        patterns = ['*.mp3', '*.wav', '*.flac', '*.wma']
        audio_files = []

        if self.recursive:
            for pattern in patterns:
                audio_files.extend(directory_path.rglob(pattern))
        else:
            for pattern in patterns:
                audio_files.extend(directory_path.glob(pattern))

        max_threads = os.cpu_count()
        cpu_threads = max((2 * max_threads) // 3, 4) if max_threads is not None else 4

        model_identifier = f"ctranslate2-4you/whisper-{self.size}-ct2-{self.quantization}"
        model = whisper_s2t.load_model(model_identifier=model_identifier, backend='CTranslate2', device=self.device, compute_type=self.quantization, asr_options={'beam_size': self.beam_size}, cpu_threads=cpu_threads)

        audio_files_str = [str(file) for file in audio_files]
        output_file_paths = [str(file.with_suffix(f'.{self.output_format}')) for file in audio_files]

        lang_codes = 'en'
        tasks = self.task
        initial_prompts = None

        start_time = time.time()

        if audio_files_str:
            self.progress.emit(f"Processing {len(audio_files_str)} files...")
            out = model.transcribe_with_vad(audio_files_str, lang_codes=lang_codes, tasks=tasks, initial_prompts=initial_prompts, batch_size=self.batch_size)
            whisper_s2t.write_outputs(out, format=self.output_format, op_files=output_file_paths)

            for original_audio_file, output_file_path in zip(audio_files, output_file_paths):
                self.progress.emit(f"{tasks.capitalize()} {original_audio_file} to {output_file_path}")

        processing_time = time.time() - start_time
        self.finished.emit(f"Total processing time: {processing_time:.2f} seconds")

image

BBC-Esq commented 8 months ago

Here's the final version that I ended up incorporating into my latest release, to avoid the issue, but would still be very interested in knowing if there's a way to address a single file to cause the entire batch processing of multiple files to fail...

https://github.com/BBC-Esq/WhisperS2T-transcriber/releases/tag/v1.1.0

shashikg commented 8 months ago

Hey @BBC-Esq ! I think there can be a simple fix for this. I will add the fix in next release.

PS: I'm slightly stuffed with my office work. Expect some delay in the next release (end of march probably).

PPS: Next release will also include end-to-end deployment ready server for WhisperS2T !!

BBC-Esq commented 5 months ago

Hey @BBC-Esq ! I think there can be a simple fix for this. I will add the fix in next release.

PS: I'm slightly stuffed with my office work. Expect some delay in the next release (end of march probably).

PPS: Next release will also include end-to-end deployment ready server for WhisperS2T !!

Do you have time to continue to work on this repository? Ctranslate2 just implemented flash attention BTW.