nomadkaraoke / python-lyrics-transcriber

Automatically create synchronised lyrics files in ASS and MidiCo LRC formats with word-level timestamps, using Whisper and lyrics from Genius and Spotify, using LLMs / GPT-4 to correct transcribed lyrics
MIT License
31 stars 8 forks source link

Word-precision timecodes for audio w/o lyrics in online databases, optionally providing manual transcript #6

Open porg opened 9 months ago

porg commented 9 months ago

Are the following use cases supported?

Goal / Desired output: Lyrics or subtitle file with word-precision timecodes

Starting point(s)

  1. Unpublished audio file for which there yet exists no public lyrics/transcripts.
    • a) Voice only as an isolated audio track.
    • b) Voice on top of instruments, no separation.
  2. Optional: Provide a manually created transcript file (simple plain text, line by line, no timecodes) to aid the processing with reliable cues.
    • Applications:
      • Less standardized language such as a dialect.
      • Or voices which for artistic purposes have an extraordinary pronunciation or tone naturally by the singer (think singers in Heavy Metal, Jazz, stylizations like Vibrato or Jodeling or voices like Björk) or due to heavy effects such as vocoder, echo, distortion, etc.
    • Added value for lyrics file creator: No need to create the timecodes by hand.
porg commented 9 months ago

Sample file

This is a short audio sample file with 4 lines:

Audio File

https://github.com/karaokenerds/python-lyrics-transcriber/assets/737143/042700b6-9853-4305-841a-554bc5114fd6

Lyrics

Line by line lyrics

  1. Word by word I sing to you!
  2. And word by word I'm rappin' to you!
  3. And word by word I'm reading out to you, indeed!
  4. Thanks.

Remarks on each lyrics line — What it tests for

  1. Singing: Some words are intentionally stretched quite long. Some words also contain a tonal change within.
  2. Rapping: 1980ies rapping style/tempo. An algorithm which performs pure text analysis will seem quite reliable.
  3. Speech: Containing extra long pauses between some words plus some word stretching. This detects any trickery quite brutally.
  4. Final: Extra long pause before the line starts. And then only a single word. Any real phonetic-correlated timecoding should also get this correctly.

Test results of various AI lyrics detection apps

porg commented 9 months ago

Croonify

https://github.com/karaokenerds/python-lyrics-transcriber/assets/737143/83ffcf40-a362-4382-b027-dcd120cd44d6

porg commented 9 months ago

Noraebang by Gaudio Lab

Overall verdict: Quite good at some positions. But at pauses or stretchings still fails miserably. Possibly only its trickery/estimation is better. Doubting that real full phonetical mapping takes place, as the failure with word pauses indicates.

https://github.com/karaokenerds/python-lyrics-transcriber/assets/737143/faf04920-fa26-4e32-b5db-f43a2232bdc6

  1. The stretched words seem well in sync. A simple text/sylable estimation mapping would not get that too well.
  2. Really exact word starts. Though no surprise, as rap is inherently quite rhythmic. But still there is some intermediate emphasis / de-emphasis, and the especially the word starts seem still spot on.
  3. Within the word stretching of "And word by word" it still seems quite in sync, but then when I some unusual pausing occurs, it totally looses sync.
  4. Already totally lost. It is already showing line 4 "Thanks" while I still have not uttered line 3's last word "indeed".
porg commented 9 months ago

Your software: Karaokenerds Lyrics Transcriber

https://github.com/karaokenerds/python-lyrics-transcriber/assets/737143/a6934aaa-fc80-4c54-a005-d61d35e0c6c7

  1. Singing: Perfect sync despite word stretching.
    • Only flaw: The Present Simple expression "I sing to you" is transliterated as Present Continuous "I'm singing to you". Is there some grammar correction applied in some pre-processing or post-processing loop?
    • Idea: Provide some fine tuning flag whether to take the input as literal as possible or whether to apply some level of plausibility checks / automatic grammar fixing.
  2. Rap: Perfect sync!
  3. Reading: Really good. Gets the pauses correctly.
    • Little flaws: "to" and "you" start a bit prematurely.
    • "Indeed" after the long pause is made into a new line. Legit.
      • In the spirit of my proposal #6 would it be possible that the app sticks to the line wrapping as intentionally provided in the supplied unsynchronized lyrics file? e.g. the word "indeed!" being the sentence end after a pause still on the same line.
  4. Final single word: Perfect sync again.