readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.44k stars 218 forks source link

[Feature Request] Influence on accuracy #271

Open ErfolgreichCharismatisch opened 3 years ago

ErfolgreichCharismatisch commented 3 years ago

I used the web app for aligning.

I found that 54 % of phrases in my test set were misaligned.

Misaligned meaning

  1. Cut too early from the end
  2. Cut too late from the previous fragment
  3. Shifted altogether

I had voice detection control whether the aligned parts matched(aeneas -> cue list -> cut audio -> have voice recognition detect speech -> compare to entries in cue list with a similarity algorithm).

Now, more than half misaligned is discouraging.

Currently I did not see many options to influence recognition, the parameters of the cli seem rather cosmetic in influence, barring language selection and input text type.

I would like to improve alignment, have a confidence message to be able to quickly review or discard.

The pipeline contains everything required for a confidence parameter. Also other parameters for deep control are important.

What I am deeply missing is a threshold parameter in decibel to define pauses and audio - this would eliminate premature cuts for good.

versae commented 1 year ago

Did you find any good solution to the misalignments?

ErfolgreichCharismatisch commented 1 year ago

Yes I did. I abandoned aeneas. Not what you wanted to hear, but that's it.

versae commented 1 year ago

I see. Are you using any other solution that provides satisfactory results?

ErrorBot1122 commented 1 year ago

What other library do you use now?