readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.53k stars 233 forks source link

Word level align #280

Closed yasntrk closed 2 years ago

yasntrk commented 2 years ago

Hi,

Is there any way to use aeneas as word level align? When i tried to do it in the google colab with following code:

!python -m aeneas.tools.execute_task --multilevel "chunk"{i}".wav" "chunk"{i}".txt" "task_language=tur|os_task_file_format=json|is_text_type=plain" "chunk"{i}".json"

or

--presets-word it says

"No module named --multilevel"

SaadBazaz commented 2 years ago

--multilevel and other similar options should appear at the end of the command.

Here's an example:

python -m aeneas.tools.execute_task   ./final_result.wav    ./transcription2.txt  "task_language=eng|os_task_file_format=json|is_text_type=plain"     map.json --preset-words

Notice how --preset-words is in the end.

You can use python -m aeneas.tools.execute_task --help for more help in the syntax.

SaadBazaz commented 2 years ago

Also, for word-level align, if I had a text file like the following (let's say it's in transcription.txt):

I WOKE UP THIS MORNING I FIXED MY BED THEN I WENT TO THE BATHROOM I BRUSHED MY TEETH AND THEN I CAME OUTSIDE I TURNED ON THE HOT WATER I HAD DRUNK THIS I HAD TOASTED EGGS FOR BREAKFAST THEN I WENT OUTSIDE IN THE BOOTH I WAITED FOR THE BOOTS THE BUSS ARRIVED THEN I SAID IN THE BUS AND CAME TO A UNIVERSITY I AM A UNIVERSITY RIGHT NOW I AM HUNGRY I PRAYED THREE TIMES TO DAY IT'S MIDNIGHT I AM LYING THAT'S IT I WANT TO GO HOME

I ran the following command in the terminal to convert it into line-by-line file.

sed 's/ /\n/g' ./transcription.txt > transcription2.txt

(basically it just replaces " " with "\n")

Feeding this file to aeneas gave me Word-level alignments.

yasntrk commented 2 years ago

Also, for word-level align, if I had a text file like the following (let's say it's in transcription.txt):

I WOKE UP THIS MORNING I FIXED MY BED THEN I WENT TO THE BATHROOM I BRUSHED MY TEETH AND THEN I CAME OUTSIDE I TURNED ON THE HOT WATER I HAD DRUNK THIS I HAD TOASTED EGGS FOR BREAKFAST THEN I WENT OUTSIDE IN THE BOOTH I WAITED FOR THE BOOTS THE BUSS ARRIVED THEN I SAID IN THE BUS AND CAME TO A UNIVERSITY I AM A UNIVERSITY RIGHT NOW I AM HUNGRY I PRAYED THREE TIMES TO DAY IT'S MIDNIGHT I AM LYING THAT'S IT I WANT TO GO HOME

I ran the following command in the terminal to convert it into line-by-line file.

sed 's/ /\n/g' ./transcription.txt > transcription2.txt

(basically it just replaces " " with "\n")

Feeding this file to aeneas gave me Word-level alignments.

Thank you for your answer sir, It worked