moehmeni / syncedlyrics

Get an LRC format (synchronized) lyrics for your music
MIT License
188 stars 18 forks source link

Add provider Youtube #28

Closed deedy5 closed 3 months ago

deedy5 commented 4 months ago

Example: https://github.com/jdepoix/youtube-transcript-api It only makes sense to look for transcripts that are manually created and in the right language.

from youtube_transcript_api import YouTubeTranscriptApi

# retrieve the available transcripts
transcript_list = YouTubeTranscriptApi.list_transcripts('video_id')

# or just filter for manually created transcripts  
transcript = transcript_list.find_manually_created_transcript(['de', 'en']) 
moehmeni commented 4 months ago

Since we need to search first, I think this works but not every video has subtitles on, and you'd get error.

def print_lrc(q):
    search = pytube.Search(q)
    res = search.results
    for i in res:
        print(i.title)
    transcript = YouTubeTranscriptApi.get_transcript(res[0].video_id)
    lrc = ""
    for line in transcript:
        m = int(line['start'] // 60)
        s = int(line['start'] % 60)
        ms = int(line['start'] * 100) % 100
        lrc += f"[{m:02}:{s:02}.{ms:02}] {line['text']}\n"
    print(lrc)

Problems:

deedy5 commented 4 months ago

Sample logic:


# find video_id and titles, choose the best 
search = pytube.Search(q)
videos = [(v.video_id, v.title) for v in search.results]
best_video_id = `logic using v.titles`

# get transcript list
transcript_list = YouTubeTranscriptApi.list_transcripts('best_video_id')`

# select manually created transcripts (excellent quality)
transcripts = transcript_list.find_manually_created_transcript()

# choose transcript with defined lang
transcript = any(t for t in transcripts if t.language_code == lang)

# fetch transcript text
result = transcript.fetch()

# translate transcript into any language you want
result_de = transcript.translate('de').fetch()

#format using SRTFormatter
formatted_transcript = SRTFormatter().format_transcript(result)
moehmeni commented 3 months ago

The problem is not finding based on title nor the translation, since all of them are good search results since youtube filters them itself already. I mean some songs have the lyrics disabled like:

def print_lrc(q):
    search = pytube.Search(q)
    res = search.results
    for i in res:
        print(i.title)
    transcript = YouTubeTranscriptApi.get_transcript(res[0].video_id)
    lrc = ""
    for line in transcript:
        m = int(line['start'] // 60)
        s = int(line['start'] % 60)
        ms = int(line['start'] * 100) % 100
        lrc += f"[{m:02}:{s:02}.{ms:02}] {line['text']}\n"
    return lrc
q = "bad guy billie eilish"
print(print_lrc(q))

Does work, but:

q = "spider on the wall clan of xymox"
print(print_lrc(q))

does not even though the video is available but we do not know

youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=lrHnc550Lb0! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

It may have a subtitle in another index res[n] but since titles are similar we can't determine which one has lyrics. also looping over all of them is not an efficient option since we already have other providers and it is more likely that they have lyrics rather than a user uploaded youtube video.

deedy5 commented 3 months ago

I wrote you the logic in a previous post. Yes, not all videos are subtitled, but you can check all results from pytube. And you need to take only subtitles added manually (automatically created ones will be of bad quality).

But, in principle, if there are other providers, youtube may not be necessary. In general, I close this issue.