m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.18k stars 1.29k forks source link

load_model function #462

Open CheshireCC opened 1 year ago

CheshireCC commented 1 year ago

I don't think it's a good idea to write the asr_options parameter in the load_model step, it should be provided to the model in the transcribe step so that the user(for python usage) can load the model without providing the asr_options parameter and run the transcribe after changing the default parameter, isn't it?

PiusLucky commented 8 months ago

Well, there are default asr_options supplied by whisperX. Being at the top level is not a bad idea as this contains other core options not directly related to transcription. Here is the full default options as of now - 02/2024


        "beam_size": 5,
        "best_of": 5,
        "patience": 1,
        "length_penalty": 1,
        "repetition_penalty": 1,
        "no_repeat_ngram_size": 0,
        "temperatures": [0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
        "compression_ratio_threshold": 2.4,
        "log_prob_threshold": -1.0,
        "no_speech_threshold": 0.6,
        "condition_on_previous_text": False,
        "prompt_reset_on_temperature": 0.5,
        "initial_prompt": None,
        "prefix": None,
        "suppress_blank": True,
        "suppress_tokens": [-1],
        "without_timestamps": True,
        "max_initial_timestamp": 0.0,
        "word_timestamps": False,
        "prepend_punctuations": "\"'“¿([{-",
        "append_punctuations": "\"'.。,,!!??::”)]}、",
        "suppress_numerals": False,
        "max_new_tokens": None,
        "clip_timestamps": None,
        "hallucination_silence_threshold": None,
    }`