tomchang25 / whisper-auto-transcribe

Auto transcribe tool based on whisper
MIT License
220 stars 15 forks source link
asr deep-learning gradio gradio-interface language-model pytorch speech-processing speech-recognition speech-to-text text-to-speech video-captioning voice-activity-detection

Stargazers Issues MIT License

whisper-auto-transcribe


Easily generate free subtitles for your video

Logo


View Demo · Report Bug · Request Feature

About The Project

Features:

Unique feature:

Future feature:

The tool is based on OpenAI-whisper, the latest project developed by OpenAI.

For more details, you can check this.

(back to top)

How to use

Installation

  1. Install Python 3 and Git

  2. Clone the repo

    # Chage currently dir to Document
    # You can specify directory to any other location except "Program Files" and "Program Files (x86)"
    cd ~
    
    # Stable version
    git clone https://github.com/tomchang25/whisper-auto-transcribe.git
    cd whisper-auto-transcribe
  3. Open webui.bat

  4. Check for any errors and ensure that the final lines are correct.

    Launching Web UI with arguments:
    Running on local URL:  http://127.0.0.1:7860
  5. Open your browser and go to http://127.0.0.1:7860

(Optional) Command-line interface

  1. Open enable_venv.bat.

  2. Now, you can use the CLI mode.

    # Get help messages
    python .\cli.py -h
    
    # A simple example
    python .\cli.py .\mp4\1min.mp4 --output .\tmp\123456.srt -lang ja --task translate --model large
    
    # A batch example
    python .\cli.py .\mp4 --output .\batch\ --model small --model medium

(Optional) GPU acceleration (CUDA.11.3)

  1. Install CUDA
  2. Install CUDNN
  3. Unistall CPU version Pytorch
    pip uninstall torch torchvision torchaudio
  4. Reinstall GPU version Pytorch
    # on Windows
    python -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html

(back to top)

Demo

Heavy Metal Watch on Youtube

404
0:53:33.590 --> 0:53:38.190
From the depths of hellish silence, bastard spells, explosive violence
(From the depths of hell in silence, Cast their spells, explosive violence)

405
0:53:38.670 --> 0:53:43.190
Russian minds have my protected, glorious mission undetected
(Russian night time flight perfected, Flawless vision, undetected)

406
0:53:44.190 --> 0:53:48.190
Put down in all the flames, I'm going strong, I'm half-moon's number one
(Pushing on and on, their planes are going strong, Air Force number one)

407
0:53:49.110 --> 0:53:53.030
Talking with the moon, looking for the truth, I'm moon's number one
(Somewhere down below they're looking for the foe, Bomber's on the run)

408
0:53:53.870 --> 0:53:58.190
You can hide, you can move, just to write, learn to expect, learn to think dark
(You can't hide, you can't move, just abide, Their attack's been proved (raiders in the dark))

409
0:53:59.110 --> 0:54:03.190
Silence is the night, the witch is in the fight, never miss the mark
(Silent through the night the witches join the fight. Never miss their mark)

410
0:54:04.150 --> 0:54:08.090
Canvas, wings of death, the pattern is your fate
(Canvas wings of death, Prepare to meet your fate)

411
0:54:09.190 --> 0:54:13.030
Night on the regiment, 188
(Night Bomber Regiment, 588)

412
0:54:14.190 --> 0:54:19.090
Undetected, unexpected, wings of glory, tell the story
(Undetected, unexpected, Wings of glory, Tell their story)

413
0:54:19.530 --> 0:54:24.110
Deviation, deviation, undetected, stealth, perfected
(Aviation, deviation, Undetected, Stealth perfected)

414
0:54:24.330 --> 0:54:28.150
Silence in ground, retreated to the sound, helpless in the air
(Foes are losing ground, retreating to the sound, Death is in the air)

415
0:54:29.130 --> 0:54:33.150
Suddenly appears, the world in your face, mindful, the witch is there
(Suddenly appears, confirming all your fears, Strike from witches lair)

416
0:54:33.830 --> 0:54:36.850
Let it fall, come around, I don't sound so, we're about to drown
(Target found, come around, barrels sound, From the battleground)

417
0:54:37.210 --> 0:54:41.210
Lashes, standing high, the old genie awaits, the beaten at the gates
(Rodina awaits, defeat them at the gates, Live to fight and fly)

418
0:54:41.790 --> 0:54:43.430
Just to fight and fly
()

419
0:54:44.250 --> 0:54:48.190
Canvas, wings of death, the pattern is your fate
(Canvas wings of death, Prepare to meet your fate)

420
0:54:49.270 --> 0:54:53.070
Night on the regiment, 188
(Night Bomber Regiment, 588)

421
0:54:54.190 --> 0:54:59.110
Undetected, unexpected, wings of glory, tell the story
(Undetected, unexpected, Wings of glory, Tell their story)

422
0:54:59.470 --> 0:55:04.110
Deviation, deviation, undetected, stealth, perfected
(Aviation, deviation, Undetected, Stealth perfected)

423
0:55:24.140 --> 0:55:27.410
Beneath the starlight of the heavens
(Beneath the starlight of the heavens)

424
0:55:29.200 --> 0:55:31.720
Unlikely heroes in disguise
(Unlikely heroes in the skies)

425
0:55:31.720 --> 0:55:34.040
Canvas, wings of death, the witch is gonna die
(Canvas wings of death, Prepare to meet your fate)

426
0:55:34.660 --> 0:55:37.320
Stay in fear, humble horizon
(As they appear on the horizon)

427
0:55:39.540 --> 0:55:43.460
Win when wisdom, and the night witch has come
(The wind will whisper when the Night Witches come)

428
0:55:44.460 --> 0:55:48.560
Undetected, unexpected, wings of glory, tell the story
(Undetected, unexpected, Wings of glory, Tell their story)

429
0:55:49.480 --> 0:55:53.540
Deviation, deviation, undetected, stealth, perfected
(Aviation, deviation, undetected, Stealth perfected)

430
0:55:54.340 --> 0:55:58.140
From the depths of hell in silence, lost in spells, explosive violence
(From the depths of hell in silence, Cast their spells, explosive violence)

431
0:55:59.260 --> 0:56:04.220
Russian beta, but perfected, bonus mission, undetected
(Russian night time flight perfected, Flawless vision, undetected)

English Watch on Youtube

0
0:00:00,0 --> 0:00:10,0
 The most popular is the Yashino Nakama, which stands on the shore of the Makurazaki City in Kagoshima Prefecture.

1
0:00:11,0 --> 0:00:22,0
 Makurazaki City used to be called the Typhoon Ginza, and the typhoon was approaching it frequently.

2
0:00:22,0 --> 0:00:27,0
 On Sunday, the Typhoon Ginza approached the Makurazaki City.

3
0:00:28,0 --> 0:00:41,0
 One of the four trees was named Yasshi on SNS, and there were many supportive comments.

4
0:00:42,0 --> 0:00:44,0
 Yasshi, do your best!

5
0:00:45,0 --> 0:00:47,0
 Yasshi, run away quickly!

6
0:00:47,0 --> 0:00:51,0
 Run away? If you have to, take off your roots and run away?

7
0:00:51,0 --> 0:01:17,0
 There are also voices asking to sell Yasshi goods.

(back to top)

Limitation

Currently, there are several restrictions on this project.

  1. GPU acceleration only works on CUDA environment.

Also, if you want to use GPU acceleration, please make sure you have enough GPU VRAM. Here is some recommended value.

Precision Whisper model Required VRAM *Time used Performance
1 tiny ~1 GB ~1/20 ~Disaster
2 base ~1 GB ~1/10 ~Youtube
3 small ~2 GB ~1/8 -
4 medium ~5 GB ~1/5 -
5 large ~10 GB ~1/2 ~Sonix.ai

*Time used is relatived to video/audio time and test in 10 min Enlgish audio with GPU acceleration.

(back to top)

Contact

Report Bugs: https://github.com/tomchang25/whisper-auto-transcribe/issues

Project Link: https://github.com/tomchang25/whisper-auto-transcribe

My twitter: https://twitter.com/Greysuki

My Gmail: tomchang25@gmail.com

(back to top)

License

The code and the model weights of Whisper are released under the MIT License.

This project is distributed under the MIT License. Please refer to LICENSE.txt for more information.

(back to top)

Acknowledgments

(back to top)