readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.45k stars 218 forks source link

Both the C extension and the pure Python code failed. (Wrong arguments? Input too big?) #213

Closed Lakshyasukhralia closed 5 years ago

Lakshyasukhralia commented 5 years ago

Getting the following error: [ERRO] An unexpected error occurred while executing the task: [ERRO] Unexpected error while executing task : Both the C extension and the pure Python code failed. (Wrong arguments? Input too big?)

Input Command : python -m aeneas.tools.execute_task venom.mp3 lyrics.txt "task_language=en|is_text_type=plain|os_task_file_format=srt" subs.srt

venom.mp3: Type = .mp3 Length = 4:07 min

lyrics.txt Format: Encoding = UTF-8 Lines = 116 Content = I got a song filled with shit for the strong-willed When the world gives you a raw deal Set you off till you scream, "Piss off! Screw you!" When it talks to you like you don't belong Or tells you you're in the wrong field When's something's in your mitochondrial 'Cause it latched on to you, like-

Knock knock, let the devil in Malevolent as I've ever been, head is spinnin' This medicine's screamin', "L-l-l-let us in!" L-l-lick like a salad bowl, Edgar Allen Poe Bedridden, shoulda been dead a long time ago Liquid Tylenol, gelatins, think my skeleton's meltin' Wicked, I get all high when I think I've smelled the scent Of elephant manure-hell, I meant Kahlúa

Log: tmpbjahhmav.log

Any help would be appreciated. Thanks for your amazing work with this software!

**Note*** Tried using mplain too but same result

readbeyond commented 5 years ago

Thank you for providing the log, which tells where the problem is:

[DEBU] 2018-09-05 09:51:20.132199 ESPEAKTTSWrapper: Calling with text 'I got a song filled with shit for the strong-willed' [DEBU] 2018-09-05 09:51:20.136576 ESPEAKTTSWrapper: Passing text via stdin... [WARN] 2018-09-05 09:51:20.137574 ESPEAKTTSWrapper: An unexpected error occurred while calling TTS engine via subprocess [WARN] 2018-09-05 09:51:20.137574 ESPEAKTTSWrapper: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to

Your input text file contains a byte order mark character (aka BOM: https://en.wikipedia.org/wiki/Byte_order_mark ), as you can see from the attached screenshot, which shows my text editor Vim showing the symbol

in light blue: that is precisely the BOM character. ![2018-09-06-23 07 45](https://user-images.githubusercontent.com/5318774/45185905-0e1eba00-b22c-11e8-85bc-a6835561f328.png) The BOM character is useless for files encoded in UTF-8, but unfortunately many applications (especially on Windows) still add it at the beginning of a file, even if it poses problem to e.g. the Python codecs library that handles the encoding of text files --- as in the above error. To solve, you need to remove the BOM character from your input text file. To do so, you can open the input text file with a text editor that allows you to save the file as "UTF-8 without BOM". There are many free text editors that can do that, one of the easiest to use on Windows is Notepad++ ( https://notepad-plus-plus.org/download/ ). Once you have saved your input file as UTF-8 without BOM, aeneas should be able to process it. HTH, Alberto Pettarin On 09/05/2018 06:05 AM, Lakshya Sukhralia wrote: > *Getting the following error:* > [ERRO] An unexpected error occurred while executing the task: > [ERRO] Unexpected error while executing task : Both the C extension and > the pure Python code failed. (Wrong arguments? Input too big?) > > *Input Command :* > python -m aeneas.tools.execute_task venom.mp3 lyrics.txt > "task_language=en|is_text_type=plain|os_task_file_format=srt" subs.srt > > *venom.mp3:* > Type = .mp3 > Length = 4:07 min > > *lyrics.txt Format:* > Encoding = UTF-8 > Content = > I got a song filled with shit for the strong-willed > When the world gives you a raw deal > Set you off till you scream, "Piss off! Screw you!" > When it talks to you like you don't belong > Or tells you you're in the wrong field > When's something's in your mitochondrial > 'Cause it latched on to you, like- > > Knock knock, let the devil in > Malevolent as I've ever been, head is spinnin' > This medicine's screamin', "L-l-l-let us in!" > L-l-lick like a salad bowl, Edgar Allen Poe > Bedridden, shoulda been dead a long time ago > Liquid Tylenol, gelatins, think my skeleton's meltin' > Wicked, I get all high when I think I've smelled the scent > Of elephant manure-hell, I meant Kahlúa > > Any help would be appreciated. > Thanks for your amazing work with this software!
Lakshyasukhralia commented 5 years ago

@readbeyond Thanks a lot. Worked like a charm!

ArvinderKumar commented 5 years ago

I am also stuck with same error. Have attached audio and text downloaded from aeneas tutorial. text.txt

Have tried with several other audios and text but all have similar issue. Could you pls. check and help. Don't have BOM issue at my end.

ArvinderKumar commented 5 years ago

p001.zip

ArvinderKumar commented 5 years ago

have installed python 3.6. FFprobe and FFmpeg are perfectly installed.

readbeyond commented 5 years ago

What platform are you working on?

Please note that e.g. there are know installation issues / limitations running aeneas on Windows. Only Linux 64 bit is officially supported. See: https://github.com/readbeyond/aeneas/#supported-platforms

In particular, if you get the "Both the C extension and the pure Python code failed." error, it is likely you do not have aeneas installed correctly. What is the output of:

$ python -m aeneas.diagnostics

?

If you only need to process a few audio/text pairs, you can use the aeneas Web application: https://www.aeneasweb.org/

Best regards,

Alberto Pettarin

ArvinderKumar commented 5 years ago

Thanks Alberto for a quick reply: Here's the result of diagnostics: D:\test>python -m aeneas.diagnostics [INFO] ffprobe        OK[INFO] ffmpeg         OK[ERRO] espeak         ERROR[INFO]   Please make sure you have espeak installed correctly[INFO]   and that its path is in your PATH environment variable[INFO]   You might also want to check that the espeak-data directory[INFO]   is set up correctly, for example, it has the correct permissions

BTW, I create read aloud e-books, where i map human recorded voice with its text and use audicity tool to get the audio timings which is a complete manual and time consuming process. I want to automate this manual process and if i get exact timings of audios of different human voices which i can map to its text then that would be of great help. I found your aeneas library can do wonder for this. Regards,Arvinder

On Tuesday, March 19, 2019, 2:00:14 AM GMT+5:30, ReadBeyond <notifications@github.com> wrote:  

What platform are you working on?

Please note that e.g. there are know installation issues / limitations running aeneas on Windows. Only Linux 64 bit is officially supported. See: https://github.com/readbeyond/aeneas/#supported-platforms

In particular, if you get the "Both the C extension and the pure Python code failed." error, it is likely you do not have aeneas installed correctly. What is the output of:

$ python -m aeneas.diagnostics

?

If you only need to process a few audio/text pairs, you can use the aeneas Web application: https://www.aeneasweb.org/

Best regards,

Alberto Pettarin

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

pettarin commented 5 years ago

The diagnostics clearly shows that:

  1. aeneas seems installed correctly
  2. espeak is not in your PATH

You should add the directory containing the espeak.exe executable to your PATH environment variable.

See: https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md#windows

How did you install aeneas? Again, please note that it is not officially supported on Windows. The installer package and directions were provided by Daniel Bair / SIL.

Best regards,

Alberto Pettarin