Closed noahgift closed 1 year ago
I think your wget is grabbing the html page rather than the wav file itself :) Note that if you don't pass any input argument, the download will happen automatically from the hub.
I think your wget is grabbing the html page rather than the wav file itself :) Note that if you don't pass any input argument, the download will happen automatically from the hub.
Thanks, this is what I get for living in the shell and not bothering to open the links I download :)
Verified this flag does work:
cargo run --features cuda --example whisper -- --task transcribe --input ../samples_jfk.wav
I manually went to the webpage, then scp'd it:
scp -i ~/Downloads/llmops.pem /Users/noahgift/Downloads/samples_jfk.wav ubuntu@54.81.160.62:~/
➜ ~ du -sh /Users/noahgift/Downloads/samples_jfk.wav 344K /Users/noahgift/Downloads/samples_jfk.wav
ubuntu@ip-172-31-21-63:~/candle$ cargo run --features cuda --example whisper -- --task transcribe --input ../samples_jfk.wav
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `target/debug/examples/whisper --task transcribe --input ../samples_jfk.wav`
loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
pcm data loaded 176000
loaded mel: [1, 80, 3000]
0.0s -- 30.0s: And so my fellow Americans ask not what your country can do for you ask what you can do for your country
The reason I asked was a previous file I had used the python whisper.py with seems to act wonky.
ubuntu@ip-172-31-21-63:~/candle$ cargo run --features cuda --example whisper -- --task transcribe --input ../four-score.wav
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `target/debug/examples/whisper --task transcribe --input ../four-score.wav`
loaded wav data: Header { audio_format: 1, channel_count: 2, sampling_rate: 16000, bytes_per_second: 64000, bytes_per_sample: 4, bits_per_sample: 16 }
pcm data loaded 636224
loaded mel: [1, 80, 6000]
no speech detected, skipping 3000 DecodingResult { tokens: [50257, 50358, 50362, 314, 1101, 8066, 467, 329, 257, 1178, 286, 262, 986, 50256], text: " I'm gonna go for a few of the...", avg_logprob: -1.7409689713691172, no_speech_prob: 0.6462810635566711, temperature: 0.0, compression_ratio: NaN }
30.0s -- 60.0s: I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that I'm going to be sure that
I wasn't sure if I need to set some other defaults and was digging into the clap code here.
No worries on my end, I can dig into settings and flags and see if I can figure out what is going on.
Agreed that the whisper output is pretty disappointing on your example. The whisper setup is pretty involved and tweaking the flags may help though in that case I didn't get anywhere, I would just suggest increasing the model size as it usually helps.
cargo run --example whisper --profile=release-with-debug -- --input ~/Downloads/four-score.wav --model medium.en --task transcribe
loaded wav data: Header { audio_format: 1, channel_count: 2, sampling_rate: 16000, bytes_per_second: 64000, bytes_per_sample: 4, bits_per_sample: 16 }
pcm data loaded 636224
loaded mel: [1, 80, 6000]
0.0s -- 30.0s: Fast forward and seven years ago our fathers bore forth on this continent a new nation conceived of liberty and dedicated to the proposition that all men are created equal.
30.0s -- 60.0s: We are engaged in a great civil war. Testing whether that mission or any mission
Besides this it would be great if you have the openai whisper version at hand to give it a shot and if it performs well I can have a look at trying to understand the discrepancies.
Agreed that the whisper output is pretty disappointing on your example. The whisper setup is pretty involved and tweaking the flags may help though in that case I didn't get anywhere, I would just suggest increasing the model size as it usually helps.
cargo run --example whisper --profile=release-with-debug -- --input ~/Downloads/four-score.wav --model medium.en --task transcribe loaded wav data: Header { audio_format: 1, channel_count: 2, sampling_rate: 16000, bytes_per_second: 64000, bytes_per_sample: 4, bits_per_sample: 16 } pcm data loaded 636224 loaded mel: [1, 80, 6000] 0.0s -- 30.0s: Fast forward and seven years ago our fathers bore forth on this continent a new nation conceived of liberty and dedicated to the proposition that all men are created equal. 30.0s -- 60.0s: We are engaged in a great civil war. Testing whether that mission or any mission
Besides this it would be great if you have the openai whisper version at hand to give it a shot and if it performs well I can have a look at trying to understand the discrepancies.
Awesome! Thanks, this was very helpful. I will leave ticket open and do some tests and report back. I have done quite a bit of work on Python MLOPs GPU GitHub Codespaces so it pretty easy to go back and forth to Python and Rust and test things out.
Also, the new ssh-remote workflow is not horrible on AWS, so I can test on those as well.
Closing this one as no recent activity, hopefully it's all sorted out.
Hi,
I have been able to get cuda, cudann and CPU Whisper Examples to work fine, except for when I try to pass in an audio file. Are CLI inputs supported yet?