readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

Is there something wrong with my settings? #181

Closed mr2coder closed 6 years ago

mr2coder commented 7 years ago

image why this happened? in f000013,f000014... the output has the same begin and end? and below is my code thx.

coding=utf-8

from aeneas.exacttiming import TimeValue from aeneas.executetask import ExecuteTask from aeneas.language import Language from aeneas.syncmap import SyncMapFormat from aeneas.task import Task from aeneas.task import TaskConfiguration from aeneas.textfile import TextFileFormat import aeneas.globalconstants as gc

create Task object

config = TaskConfiguration() config[gc.PPN_TASK_LANGUAGE] = Language.ENG config[gc.PPN_TASK_IS_TEXT_FILE_FORMAT] = TextFileFormat.PLAIN config[gc.PPN_TASK_OS_FILE_FORMAT] = SyncMapFormat.JSON task = Task() task.configuration = config task.audio_file_path_absolute = u"audio.wav" task.text_file_path_absolute = u"plain.txt"

process Task

ExecuteTask(task).execute()

print produced sync map

print(task.sync_map)

readbeyond commented 7 years ago

It is hard to tell without knowing the kind of audio and the text you are trying to synchronize.

In general, zero-length fragments might happen, and the reason is either:

  1. the corresponding text fragment has zero length (empty string)

or

  1. the text and the speech are significantly different (e.g., the audio contains a whole sentence more than the text, or vice versa)

or

  1. if towards the begin/end, the audio contains a head or tail made of music or other non-speech sound.

In cases 1 or 2 the solution is to edit your text appropriately. In case 3 you can specify the length of the head/tail to be ignored for alignment purposes.

Also note that there is a parameter that you can add to your task config string ( "task_adjust_boundary_no_zero=True" ) or object ( config[gc.PPN_TASK_ADJUST_BOUNDARY_NO_ZERO] = True ), that will add 0.001 s to zero-length fragments. This is useful if you must ensure that no fragment has zero length (e.g., it is not allowed by the EPUB SMIL specification), but of course it will not solve any "logic" issue like 1-2-3 above.

Finally, it might be that the algorithmic approach of aeneas is not suitable for your particular audio/text or language.

Please check if your case is 1, 2, or 3. If not, and you really need help, please send me privately your audio/text files via email, and I will have a look at them.

Alberto Pettarin

onsunsl commented 7 years ago

@readbeyond : I think error lines are: f000013 to f000016, Their begin and end times are the 45.320. I also encountered the same problem:word and audio are not aligned.

HEAD 1 0.000 0.000 f000001 0 0.000 0.200 f000002 0 0.200 0.360 f000003 0 0.360 0.560 f000004 0 0.560 0.960 f000005 0 0.960 0.960 f000006 0 0.960 1.240 f000007 0 1.240 1.440 f000008 0 1.440 1.640 f000009 0 1.640 1.760 f000010 0 1.760 1.960 f000011 0 1.960 2.080 f000012 0 2.080 2.320 f000013 0 2.320 2.360 f000014 0 2.360 2.640 f000015 0 2.640 2.920 f000016 0 2.920 3.200 f000017 0 3.200 3.440 f000018 0 3.440 3.640 f000019 0 3.640 3.920 f000020 0 3.920 3.920 f000021 0 3.920 3.920 f000022 0 3.920 3.920 f000023 0 3.920 4.120 f000024 0 4.120 4.360 f000025 0 4.360 4.400 f000026 0 4.400 4.520 f000027 0 4.520 4.760 f000028 0 4.760 5.000 f000029 0 5.000 5.200 f000030 0 5.200 5.320 TAIL 2 5.320 5.322

shairoz commented 7 years ago

same error, solved by downgrading to previous release (1.7.0)