Closed mustafaxfe closed 4 years ago
Hi,
what matters is the audio duration (in seconds), rather than the size of the file, but judging from the text, it is pretty big.
aeneas uses a SC-banded DTW algorithm, which eats an amount of RAM proportional to the length of the audio file and of the MFCC window. See: https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md and the docs: https://www.readbeyond.it/aeneas/docs/
In your case, I would suggest breaking down the text and audio into segments of 1 hour --- it should not take that long --- and run aeneas on each segment separately. It should be easy to piece the timings back together in sequence --- there is a tool in the aeneas package for that.
Best regards,
Alberto Pettarin
On 1/24/19 12:12 AM, mustafa wrote:
I have been trying to create a dataset for my speech recognition project. I have started to create text files in aeneas format, also cleaned it from special characters. But when I try to execute task object it crashes in Google Colab. Also, I tried it with my local ubuntu installation, and It seems it is using more than 3 gb(after some times it was using 15gb). Is it possible to reduce memory usage of aeneas, My python version 3.6 and I am working on Google Colab: My Task object and its execution:
|from aeneas.executetask import ExecuteTask from aeneas.task import Task
create Task object config_string =
u"task_language=tur|is_text_type=plain|os_task_file_format=json" task = Task(config_string=config_string) task.audio_file_path_absolute = "Nutuk_sesli.mp3" task.text_file_path_absolute = "nutuk_aeneas_data_all.txt" task.sync_map_file_path_absolute = "syncmap.json" # process Task ExecuteTask(task).execute() # output sync map to file task.output_sync_map_file() |
My audio file is approximately 700 Mb, And my aeneas file(text file) https://gist.github.com/mustafaxfe/a59485497bda74c5dbb4406f0c4a3f5c
Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/readbeyond/aeneas/issues/224, or mute the thread https://github.com/notifications/unsubscribe-auth/AFEodvk4iASliTj5z37KGVFCQgJ54P4oks5vGOxUgaJpZM4aP0MJ.
I have been trying to create a dataset for my speech recognition project. I have started to create text files in aeneas format, also cleaned it from special characters. But when I try to execute task object it crashes in Google Colab. Also, I tried it with my local ubuntu installation, and It seems it is using more than 3 gb(after some times it was using 15gb). Is it possible to reduce memory usage of aeneas, My python version 3.6 and I am working on Google Colab: My Task object and its execution:
My audio file is approximately 700 Mb, And my aeneas file(text file) https://gist.github.com/mustafaxfe/a59485497bda74c5dbb4406f0c4a3f5c
Thanks