readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.44k stars 218 forks source link

Simple alignments breakdown near end of audio: please help. #295

Open davidbernat opened 1 year ago

davidbernat commented 1 year ago

I am a beginner user of aeneas (MacBook 2021 Ventura 13.0.1) with a large amount of experience in natural language processing, audio, algorithms, and software. I understand the basic principals of aeneas and forced alignment algorithms.

I recently noticed that my configuration 'runs out of room' and the alignment begins to produce errors of the same type.

Can someone familiar with the aeneas package help me debug this? I will provide more clear code as we discuss.

Here is the basic outline of my usage:

            phrases = [m["text_during"] for m in continuous[i]]
            audio = MoviePyUtilities.concat_clips_as_list([AudioFileClip(a["filename"]) for a in grouped_audio[i]], composite=True)
            tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=True)
            audio.write_audiofile(tmp.name, codec="pcm_s32le", fps=MoviePyUtilities.get_fps(audio))
            forced = ForcedAlignment.force_alignment(phrases, tmp.name)
            if forced is None: raise RuntimeError(f"Error occurred during ForcedAlignment for continuous index {i}")

Nothing particularly unique in the above: I have a collection of phrases, each about one sentence long, and I have associated audio. I write the audio to a temporary file, and inside the forced_alignment function I will write the phrases to disk.

            text_file = tempfile.NamedTemporaryFile(delete=True)
            json_file = tempfile.NamedTemporaryFile(delete=True)
            with open(text_file.name, "w") as f:
                f.write("\n".join(shortened))
            args = ["aeneas", audio_filename, text_file.name,
                    "task_language=eng|os_task_file_format=json|is_text_type=plain", json_file.name]
            e = ExecuteTaskCLI()
            e.use_sys = False
            code = e.run(arguments=args, show_help=False)
            if code != 0: raise RuntimeError()
            with open(json_file.name) as f:
                results = json.load(f)

Here I execute the aeneas package using the configuration shown above. Typical results are published below. I have also tried varying the length of the phrases and the same problem persists.

Screenshot 2022-12-24 at 7 25 24 AM

You can see that the alignment for the first three phrases is roughly correct, and the fourth phrase is essentially provided zero length. This is wrong. It almost appears as though the tempo of the alignment is wrong: in other words, the proportion of the first three phrases is correct, but each 'too long,' and then aeneas simply runs out of length of the audio file.

This package is very important, and its algorithm and implementation is very streamline and an excellent baseline for many more sophisticated audio applications.

Can we debug?

changyr66 commented 1 year ago

I encountered the same issue. Did you figure out the reasons and solutions?

Oleg-A-LLIto commented 1 year ago

Same problem here, seems weird to me how the errors accumulate instead of each longer part just chipping off the start of the next one. After all, the start and finish time are the most important and what the thing should analyze, not the duration

davidbernat commented 1 year ago

@Oleg-A-LLIto @changyr66 Can you post your code and data file examples? The feature of aeneas is that the underlying technology is simple bigram matched filters. It should be robust. Or at least straightforward to diagnose. Though I believe the binary is pre-compiled?

Oleg-A-LLIto commented 1 year ago

Sure, here's an example. Unfortunately, I had to change the json to txt and mp3 to mp4 (github likes it that way).

TextInitial.txt TextMarked.txt https://github.com/readbeyond/aeneas/assets/43452849/c7546f3d-3db2-4e28-9292-4a52ffd0f018