m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.61k stars 1.33k forks source link

Model is Downloaded but not loaded jonatasgrosman--wav2vec2-large-xlsr-53-japanese #897

Open andriken opened 1 month ago

andriken commented 1 month ago

see my code below

`import whisperx device = "cuda" compute_type = "float16" model_dir = "D:/cu/ai-video-dubber/whisper_models/" model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir) batch_size = 8 print("Model loaded...")

def transcribe(vocals_file_path): audio = whisperx.load_audio(vocals_file_path) result = model.transcribe(audio, batch_size=batch_size, task="translate", language="ja") model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device) result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False) print(result["segments"]) return result

transcribe("studentvideo.mp4")`

it downloaded this model at this location see "C:\Users\Andriken.cache\huggingface\hub\models--jonatasgrosman--wav2vec2-large-xlsr-53-japanese" but i'm getting this as a segment results below

"{'segments': [{'start': 1.548, 'end': 1.748, 'text': ' Did you clean it?Yes.', 'words': [{'word': 'D'}, {'word': 'i'}, {'word': 'd'}, {'word': 'y'}, {'word': 'o'}, {'word': 'u'}, {'word': 'c', 'start': 1.548, 'end': 1.668, 'score': 0.833}, {'word': 'l'}, {'word': 'e'}, {'word': 'a', 'start': 1.668, 'end': 1.748, 'score': 0.75}, {'word': 'n'}, {'word': 'i'}, {'word': 't'}, {'word': '?'}, {'word': 'Y'}, {'word': 'e'}, {'word': 's'}, {'word': '.'}]},"

when specifying language as "en" I get proper words in word timestamps instead of letters, the letters issue occurs only when specifying language as "ja"