Open michoael opened 1 year ago
can you share wav file here
I'm sorry i can't understand how upload file here. but i converted wav file by ffmpeg with this command (from Mp4 to wav) and i add more 10second because if file short convert to long and can listening it & all this worked fine on vosk library
` String[] c = { "-y", "-i", nameFile,"-acodec", "pcm_s16le", "-ar" ,"16000" ,"-ac" ,"1" ,"-af","apad=pad_dur=10s",lastL.getPath() };
`
and long file 4 minutes
sometimes get only first sentence, and another time get 5 sentence (with any changed on code)
if share wav file important please tell me how i can send it
Could you please upload to Google drive and share the link
if redirect to wrong link take it copy and past to download directly
I got file
I got file and I am working on it
I see the below output "become our student and get access to effective and free educational materials" and I think it has multiple voices and let me try with original pytorch openai whisper
I tried your file with original openai whsiper model on Google colab and I see the same output as above, and I guess it may be due to Speaker diarisation !pip install transformersfrom transformers import pipeline pipe = pipeline(task="automatic-speech-recognition", model="openai/whisper-tiny") pipe("/content/testwo.wav")
{'text': ' become our student and get access to effective and free educational materials.'}
by the way right now model restricts to take only 30seconds audio clip as input rest of the file will be ignored.in order to make it to work for big file we need to split audio content into 30s chunks and feed for whisper model
if you try it again like 3 or 4 times you can get
Become our student and get access to effective and free educational materials. Where are you studying and what's your major? I am studying at Beijing University. I major in civil law. Why did you choose Beijing University?
you say : and I think it has multiple voices and let me try with original pytorch openai whisper
but somtimes i can get this resulte
Become our student and get access to effective and free educational materials. Where are you studying and what's your major? I am studying at Beijing University. I major in civil law. Why did you choose Beijing University?
this resulte on your model !
and it is expected result and if you need more to be transcribe need to split audio file into 30s chunk each and feed as input to model and you are expected to get full audio text
imm, but how i can cut audio to 30 sec without cut on speaker ?
and no way to change it ? i hope use long file directly because i want add timestamp too
I need to work on to support long files
okay, i will wait you, and i hope do it like vosk, thank you bro
If you can tell me what you will work on Code java or models ( i want learn and understand)
I think nyadla you mean the challence here is where to split, because we don't want to split in the middle of a word, right?
I'm wait this project from a lot time, thank you bro. I changed something to transcribe wav file but from choose not from recorder but when i try transcribe i get only one sentence and not complete listen. My file wav worked fine on vosk, what i do wrong?