zh-plus / openlrc

Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。
https://zh-plus.github.io/openlrc/
MIT License
449 stars 32 forks source link
auto-subtitle faster-whisper lyrics lyrics-generator openai-api openlrc python speech-to-text subtitle-translation transcribe voice-to-text whisper

Open-Lyrics

PyPI PyPI - License Downloads GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude.

Key Features:

New 🚨

Installation ⚙️

  1. Please install CUDA 11.x and cuDNN 8 for CUDA 11 first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

    faster-whisper also needs cuBLAS for CUDA 11 installed.

    For Windows Users (click to expand) (For Windows Users only) Windows user can Download the libraries from Purfview's repository: Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA libraries for Windows in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs). Decompress the archive and place the libraries in a directory included in the `PATH`.
  2. Add LLM API keys, you can either:

  3. Install latest fast-whisper from source:

    pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/d57c5b40b06e59ec44240d93485a95799548af50.tar.gz"
  4. Install ffmpeg and add bin directory to your PATH.

  5. This project can be installed from PyPI:

    pip install openlrc

    or install directly from GitHub:

    pip install git+https://github.com/zh-plus/openlrc
  6. Install PyTorch:

    pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Usage 🐍

GUI

[!NOTE] We are migrating the GUI from streamlit to Gradio. The GUI is still under development.

openlrc gui

Python code

from openlrc import LRCer

if __name__ == '__main__':
    lrcer = LRCer()

    # Single file
    lrcer.run('./data/test.mp3',
              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

    # Multiple files
    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
    # Note we run the transcription sequentially, but run the translation concurrently for each file.

    # Path can contain video
    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt

    # Use glossary to improve translation
    lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')

    # To skip translation process
    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

    # Change asr_options or vad_options, check openlrc.defaults for details
    vad_options = {"threshold": 0.1}
    lrcer = LRCer(vad_options=vad_options)
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Enhance the audio using noise suppression (consume more time).
    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

    # Change the LLM model for translation
    lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Clear temp folder after processing done
    lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp=True)

    # Change base_url
    lrcer = LRCer(base_url_config={'openai': 'https://api.g4f.icu/v1',
                                   'anthropic': 'https://example/api'})

    # Route model to arbitrary Chatbot SDK
    lrcer = LRCer(chatbot_model='openai: claude-3-sonnet-20240229',
                  base_url_config={'openai': 'https://api.g4f.icu/v1/'})

    # Bilingual subtitle
    lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)

Check more details in Documentation.

Glossary

Add glossary to improve domain specific translation. For example aoe4-glossary.yaml:

{
  "aoe4": "帝国时代4",
  "feudal": "封建时代",
  "2TC": "双TC",
  "English": "英格兰文明",
  "scout": "侦察兵"
}
lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')
lrcer.run('./data/test.mp3', target_lang='zh-cn')

or directly use dictionary to add glossary:

lrcer = LRCer(glossary={"aoe4": "帝国时代4", "feudal": "封建时代"})
lrcer.run('./data/test.mp3', target_lang='zh-cn')

Pricing 💰

pricing data from OpenAI and Anthropic

Model Name Pricing for 1M Tokens
(Input/Output) (USD)
Cost for 1 Hour Audio
(USD)
gpt-3.5-turbo-0125 0.5, 1.5 0.01
gpt-3.5-turbo 0.5, 1.5 0.01
gpt-4-0125-preview 10, 30 0.5
gpt-4-turbo-preview 10, 30 0.5
gpt-4o 5, 15 0.25
claude-3-haiku-20240307 0.25, 1.25 0.015
claude-3-sonnet-20240229 3, 15 0.2
claude-3-opus-20240229 15, 75 1
claude-3-5-sonnet-20240620 3, 15 0.2
gemini-1.5-flash 0.175, 2.1 0.01
gemini-1.0-pro 0.5, 1.5 0.01
gemini-1.5-pro 1.75, 21 0.1

Note the cost is estimated based on the token count of the input and output text. The actual cost may vary due to the language and audio speed.

Recommended translation model

For english audio, we recommend using gpt-3.5-turbo or gemini-1.5-flash.

For non-english audio, we recommend using claude-3-5-sonnet-20240620.

How it works

To maintain context between translation segments, the process is sequential for each audio file.

Todo

Credits

Star History

Star History Chart