readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.45k stars 218 forks source link

plain text alignment #242

Closed Tortoise17 closed 4 years ago

Tortoise17 commented 4 years ago

I have plain text as a paragraph without punctuation. Is it possible to make alignment in fragments with this tool? because I tried it and it is considering complete paragraph as single fragment.

sidvud98 commented 4 years ago

No, there isn't one. But I have made one script myself. I wanted to make subtitles .srt for and audiobook in mp3. All I had was just text file(which I converted from .mobi using calibre) with each paragraph from eBook in one line on text file. I wanted the subtitles to appear as a fragment of 8 words on the subtitles for convenience. I wrote some code that took each para and split them into 8 words on each line of new parsed text file. After that just set the is_text_type=plain and input the new parsed text file and the mp3 audio file to get clean srt files.

custom_parser.py

import re
output= open("new_parsed.txt","a+")
from nltk.tokenize import word_tokenize
path = 'The_ebook_converted_from_mobi_or_epub_using_calibre.txt'

n = 8                                           #number of words in each fragment
f= open(path,'r')
g=f.readlines()

for x in g:
    if (len(x.split())) > n:
        str1 = ""
        i = 1
        for ele in x.split():
            str1 += ele+' '
            i += 1
            if (i) % n == 0:
                output.write(str1+'\n')
                str1 = ''
            if (i-1) == len(x.split()):
                output.write(str1+'\n')
    elif (((len(x.split())) > 0) and ( (len(x.split())) <= n )):
        output.write(x+'\n')
f.close() 
pettarin commented 4 years ago

@Tortoise17 aeneas assumes that you have already fragmented your text, one way or another. If you start with unsplit/unfragmented text, you need to write your own logic to split it up, or use NLP libraries like nltk, as suggested above.

sidvud98 commented 4 years ago

@pettarin What if we want the text to be fragmented based on the pauses in the speech rather than a pre-defined generalized logic. That type of fragmentation makes more sense. I think such a feature should be added.

pettarin commented 4 years ago

I have plans to include fragmentation by speech/non-speech in aeneas v2. For the time being, you can do it yourself by first using a VAD (the one included in aeneas or another one of your choice) and doing a first alignment of the text at word level granularity, and then try to "match" the word begin/end timings with the speech/non-speech intervals found by the VAD.

But again, an ASR-based system will probably generate better results anyway.