Open CheshireCC opened 1 year ago
Well, there are default asr_options supplied by whisperX. Being at the top level is not a bad idea as this contains other core options not directly related to transcription. Here is the full default options as of now - 02/2024
"beam_size": 5,
"best_of": 5,
"patience": 1,
"length_penalty": 1,
"repetition_penalty": 1,
"no_repeat_ngram_size": 0,
"temperatures": [0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
"compression_ratio_threshold": 2.4,
"log_prob_threshold": -1.0,
"no_speech_threshold": 0.6,
"condition_on_previous_text": False,
"prompt_reset_on_temperature": 0.5,
"initial_prompt": None,
"prefix": None,
"suppress_blank": True,
"suppress_tokens": [-1],
"without_timestamps": True,
"max_initial_timestamp": 0.0,
"word_timestamps": False,
"prepend_punctuations": "\"'“¿([{-",
"append_punctuations": "\"'.。,,!!??::”)]}、",
"suppress_numerals": False,
"max_new_tokens": None,
"clip_timestamps": None,
"hallucination_silence_threshold": None,
}`
I don't think it's a good idea to write the
asr_options
parameter in theload_model
step, it should be provided to the model in thetranscribe
step so that the user(for python usage) can load the model without providing theasr_options
parameter and run thetranscribe
after changing the default parameter, isn't it?