Open porg opened 9 months ago
This is a short audio sample file with 4 lines:
.m4a
to .mp4
to comply with GitHubs allowed file extensions.Overall verdict: Quite good at some positions. But at pauses or stretchings still fails miserably. Possibly only its trickery/estimation is better. Doubting that real full phonetical mapping takes place, as the failure with word pauses indicates.
Are the following use cases supported?
Goal / Desired output: Lyrics or subtitle file with word-precision timecodes
Starting point(s)