linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

Large-v3 #134

Closed takefy-dev closed 11 months ago

takefy-dev commented 12 months ago

image

Jeronymous commented 12 months ago

Can you give more details about the command line you run? (and if possible share the input audio as well)

Also what does it give when you run the CLI with --versions?

Thanks

takefy-dev commented 12 months ago

Hey I run with python like the example whsiper.transcript("preview.mp4")

The only thing is the model I select large-v3 and it's only doing this with large v3. And with all videos

I have the latest version since i downloaded today

Jeronymous commented 12 months ago

OK thanks, I'll check what's happening with large-v3 which was released recently and not tested with whisper-timestamped yet.

Jeronymous commented 12 months ago

Just can you please give me the output of

import whisper, whisper_timestamped
print(whisper.__version__)
print(whisper_timestamped.__version__)
takefy-dev commented 12 months ago

OK thanks, I'll check what's happening with large-v3 which was released recently and not tested with whisper-timestamped yet.

Ok thanks

image
takefy-dev commented 12 months ago

Also curious to know what GPU (ec2 ) do you recommend for running 50-min videos.

takefy-dev commented 12 months ago

Hey any news it is doing this in every large model.

Jeronymous commented 12 months ago

I updated things to improve the support of the new model (large-v3). Can you retry with the latest version? Maybe it solves this issue.

If it does not, is it possible for you to share the audio (and the exact command/code you launch)? Because I could not reproduce this issue.

takefy-dev commented 12 months ago

Yeah I let you know Jérôme

Le 13 nov. 2023 à 10:07, Jérôme Louradour @.***> a écrit :

share

darnn commented 12 months ago

For the record, with a fresh install of Whisper-TS here, it seemed to work on the one-minute file I tried it on. My GPU's not powerful enough to run large at all, so I used the CPU, if it matters. V1, V2 and V3 were about the same, both in time and in output. Some things are better in 3, some worse, but I guess that doesn't have anything to do with Whisper-TS one way or the other.

takefy-dev commented 12 months ago

It seems to work now on an ec2 instance I will test with a T4 gpu

Le 13 nov. 2023 à 11:58, darnn @.***> a écrit :

,