tekakutli / anime_translation

AI-helped transcription and translation
MIT License
125 stars 13 forks source link

Anime Translation Initiative

AI-helped transcription and translation
Everything works offline

LOAD THE FUNCTIONS

source snippets/enviromentvariables.sh #YOU MUST EDIT THIS ONE
source snippets/functions.sh
source snippets/opus.sh
source snippets/timeformat.sh

Workflow

Model Usage

get audio from video file for whisper to use

useWhisper

the -tr flag activates translation into english, without it transcribes into japanese

Warning

VTT efficient creation or edit

I use subed
git clone https://github.com/sachac/subed
add Subed from configAdd.el to Emacs config.el
alternatively, add this extra:
git clone https://gist.github.com/mooseyboots/d9a183795e5704d3f517878703407184
add Subed Extra Section from configAdd.el to Emacs config.el

AutoSync the Subs

This ffsubsync-script first autosyncs japanese captions with japanese audio, and then uses those timestamps to sync english captions to japanese captions.
The japanese captions only need to be phonetically close, which means that we could use a smaller-faster model to get them instead, ggml-small.bin, here.
This is the reason behind the names, why some are called whisper_small vs whisper_large (the model used).

make installffsubsync
autosync

Other Utils

To .srt Conversion

vttToSrt subs.vtt

Export final .mp4 with subtitles

exportSubs

To format a given time in milliseconds or as timestamps, example:

#timeformat.sh has these two commodity functions:
milliformat "2.3" #2 minutes 3 seconds
stampformat "3.2.1" #3 hours 2 minutes 1 second

Grammar-Spelling Checking Language-Tool

Install full-version of Language Tool

make installlanguagetool

Activate it

languagetool

add LanguageTool section from configAdd.el to Emacs config.el
Emacs use:

(langtool-check)

Local Text Translation

your FROM-TO model is either here or here
example, to get the models I use:

make opusInstallExample

edit PATH_TO_SUBS/Opus-MT/services.json appropriately, then:

make installopus

To activate:

#opus.sh has commodity functions
Opus-MT

To use:

t "text to translate"

Get Event Timestamps

Scene-timestamps

Visual Scene timestamps:

make installSceneTimestamps

sceneTimestamps

VAD, Speech timestamps

What is VAD? VAD means: Voice Activity Detector
It gives you the speech timestamps, when human voice is detected
first install torch, then:

speechTimestamps

Translate the Speakers-Stream

you'll need to Ctrl-C to stop recording, after which it will translate the temporal recording

streamtranslate

if you were to have sway, you could put this in your sway config, and have an easy keybinding to translate what you are hearing

bindsym $mod+Shift+return exec alacritty -e bash /home/$USER/files/code/anime_translation/snippets/streamtranslate.sh

Dependencies