linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

German Audios translated instead of transcribed #133

Closed palibvb closed 11 months ago

palibvb commented 12 months ago

My German audios are somehow translated which is not intended (the translation is perfect)

This is my function

def transcribe_file(file_path): output = model.transcribe(file_path, language="de", task="transcribe", word_timestamps=True) return json.dumps(output)

Issue arises on a T4 as well as locally.

I've attached the audio file and its json output.

https://github.com/linto-ai/whisper-timestamped/assets/83012718/4184bfbc-17ba-4368-907b-66d4a1e1f526

audio_output.json

Jeronymous commented 12 months ago

Thank you. Some details are missing to run the function : which model are you using? Also you don't seem to use whisper-timestamped here, just regular whisper decoding

Jeronymous commented 12 months ago

At the end of your json I see this: "language": "en"

I suspect that you are using a English Whisper model (like "medium-en"). Otherwise, there is a bug in openai-whisper... (or you are confused about what function is called on your audio)