Open-Lyrics is a Python library that transcribes voice files using
faster-whisper, and translates/polishes the resulting text
into .lrc
files in the desired language using LLM,
e.g. OpenAI-GPT, Anthropic-Claude.
lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',
'anthropic': 'https://example/api'})
lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)
chatbot_model
to
provider: model_name
together with base_url_config:
lrcer = LRCer(chatbot_model='openai: claude-3-haiku-20240307',
base_url_config={'openai': 'https://api.g4f.icu/v1/'})
gemini-1.5-flash
:
lrcer = LRCer(chatbot_model='gemini-1.5-flash')
pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/d57c5b40b06e59ec44240d93485a95799548af50.tar.gz"
Please install CUDA 11.x and cuDNN 8 for CUDA 11 first according
to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper
.
faster-whisper
also needs cuBLAS for CUDA 11 installed.
Add LLM API keys, you can either:
OPENAI_API_KEY
.ANTHROPIC_API_KEY
.GOOGLE_API_KEY
.Install latest fast-whisper from source:
pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/d57c5b40b06e59ec44240d93485a95799548af50.tar.gz"
Install ffmpeg and add bin
directory
to your PATH
.
This project can be installed from PyPI:
pip install openlrc
or install directly from GitHub:
pip install git+https://github.com/zh-plus/openlrc
Install PyTorch:
pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
[!NOTE] We are migrating the GUI from streamlit to Gradio. The GUI is still under development.
openlrc gui
from openlrc import LRCer
if __name__ == '__main__':
lrcer = LRCer()
# Single file
lrcer.run('./data/test.mp3',
target_lang='zh-cn') # Generate translated ./data/test.lrc with default translate prompt.
# Multiple files
lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
# Note we run the transcription sequentially, but run the translation concurrently for each file.
# Path can contain video
lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
# Generate translated ./data/test_audio.lrc and ./data/test_video.srt
# Use glossary to improve translation
lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')
# To skip translation process
lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)
# Change asr_options or vad_options, check openlrc.defaults for details
vad_options = {"threshold": 0.1}
lrcer = LRCer(vad_options=vad_options)
lrcer.run('./data/test.mp3', target_lang='zh-cn')
# Enhance the audio using noise suppression (consume more time).
lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)
# Change the LLM model for translation
lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
lrcer.run('./data/test.mp3', target_lang='zh-cn')
# Clear temp folder after processing done
lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp=True)
# Change base_url
lrcer = LRCer(base_url_config={'openai': 'https://api.g4f.icu/v1',
'anthropic': 'https://example/api'})
# Route model to arbitrary Chatbot SDK
lrcer = LRCer(chatbot_model='openai: claude-3-sonnet-20240229',
base_url_config={'openai': 'https://api.g4f.icu/v1/'})
# Bilingual subtitle
lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)
Check more details in Documentation.
Add glossary to improve domain specific translation. For example aoe4-glossary.yaml
:
{
"aoe4": "帝国时代4",
"feudal": "封建时代",
"2TC": "双TC",
"English": "英格兰文明",
"scout": "侦察兵"
}
lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')
lrcer.run('./data/test.mp3', target_lang='zh-cn')
or directly use dictionary to add glossary:
lrcer = LRCer(glossary={"aoe4": "帝国时代4", "feudal": "封建时代"})
lrcer.run('./data/test.mp3', target_lang='zh-cn')
pricing data from OpenAI and Anthropic
Model Name | Pricing for 1M Tokens (Input/Output) (USD) |
Cost for 1 Hour Audio (USD) |
---|---|---|
gpt-3.5-turbo-0125 |
0.5, 1.5 | 0.01 |
gpt-3.5-turbo |
0.5, 1.5 | 0.01 |
gpt-4-0125-preview |
10, 30 | 0.5 |
gpt-4-turbo-preview |
10, 30 | 0.5 |
gpt-4o |
5, 15 | 0.25 |
claude-3-haiku-20240307 |
0.25, 1.25 | 0.015 |
claude-3-sonnet-20240229 |
3, 15 | 0.2 |
claude-3-opus-20240229 |
15, 75 | 1 |
claude-3-5-sonnet-20240620 |
3, 15 | 0.2 |
gemini-1.5-flash |
0.175, 2.1 | 0.01 |
gemini-1.0-pro |
0.5, 1.5 | 0.01 |
gemini-1.5-pro |
1.75, 21 | 0.1 |
Note the cost is estimated based on the token count of the input and output text. The actual cost may vary due to the language and audio speed.
For english audio, we recommend using gpt-3.5-turbo
or gemini-1.5-flash
.
For non-english audio, we recommend using claude-3-5-sonnet-20240620
.
To maintain context between translation segments, the process is sequential for each audio file.