nyadla-sys / whisper.tflite

Optimized OpenAI's Whisper TFLite Port for Efficient Offline Inference on Edge Devices
MIT License
134 stars 29 forks source link

Good work #6

Open michoael opened 11 months ago

michoael commented 11 months ago

I'm wait this project from a lot time, thank you bro. I changed something to transcribe wav file but from choose not from recorder but when i try transcribe i get only one sentence and not complete listen. My file wav worked fine on vosk, what i do wrong?

nyadla-sys commented 11 months ago

can you share wav file here

michoael commented 11 months ago

I'm sorry i can't understand how upload file here. but i converted wav file by ffmpeg with this command (from Mp4 to wav) and i add more 10second because if file short convert to long and can listening it & all this worked fine on vosk library

` String[] c = { "-y", "-i", nameFile,"-acodec", "pcm_s16le", "-ar" ,"16000" ,"-ac" ,"1" ,"-af","apad=pad_dur=10s",lastL.getPath() };

` and long file 4 minutes
sometimes get only first sentence, and another time get 5 sentence (with any changed on code)

if share wav file important please tell me how i can send it

nyadla-sys commented 11 months ago

Could you please upload to Google drive and share the link

michoael commented 11 months ago

if redirect to wrong link take it copy and past to download directly

nyadla-sys commented 11 months ago

I got file

nyadla-sys commented 11 months ago

I got file and I am working on it

nyadla-sys commented 11 months ago

I see the below output "become our student and get access to effective and free educational materials" and I think it has multiple voices and let me try with original pytorch openai whisper

nyadla-sys commented 11 months ago

I tried your file with original openai whsiper model on Google colab and I see the same output as above, and I guess it may be due to Speaker diarisation !pip install transformersfrom transformers import pipeline pipe = pipeline(task="automatic-speech-recognition", model="openai/whisper-tiny") pipe("/content/testwo.wav")

output

{'text': ' become our student and get access to effective and free educational materials.'}

nyadla-sys commented 11 months ago

by the way right now model restricts to take only 30seconds audio clip as input rest of the file will be ignored.in order to make it to work for big file we need to split audio content into 30s chunks and feed for whisper model

michoael commented 11 months ago

if you try it again like 3 or 4 times you can get

Become our student and get access to effective and free educational materials. Where are you studying and what's your major? I am studying at Beijing University. I major in civil law. Why did you choose Beijing University?

michoael commented 11 months ago

you say : and I think it has multiple voices and let me try with original pytorch openai whisper

but somtimes i can get this resulte

Become our student and get access to effective and free educational materials. Where are you studying and what's your major? I am studying at Beijing University. I major in civil law. Why did you choose Beijing University?

this resulte on your model !

nyadla-sys commented 11 months ago

and it is expected result and if you need more to be transcribe need to split audio file into 30s chunk each and feed as input to model and you are expected to get full audio text

michoael commented 11 months ago

imm, but how i can cut audio to 30 sec without cut on speaker ?

michoael commented 11 months ago

and no way to change it ? i hope use long file directly because i want add timestamp too

nyadla-sys commented 11 months ago

I need to work on to support long files

michoael commented 11 months ago

okay, i will wait you, and i hope do it like vosk, thank you bro

michoael commented 11 months ago

If you can tell me what you will work on Code java or models ( i want learn and understand)

lrq3000 commented 10 months ago

I think nyadla you mean the challence here is where to split, because we don't want to split in the middle of a word, right?