yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
85.79k stars 6.69k forks source link

YouTube Music has both original and localized/translated titles in the page. How to extract the translated title? #11317

Open DWSuryo opened 20 hours ago

DWSuryo commented 20 hours ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Please make sure the question is worded well enough to be understood

Hello again. I have a project to extract (not download) music data in YouTube website and music, particularly title, alternate title, translated title, channel, duration, and views. I've done some parts, but there's something I want to ask, that is how to extract translated title in YT especially YT Music. As for the context, some songs in YT Music have localized/translated titles. Here's the example

YT Music link: https://music.youtube.com/playlist?list=OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ:

image

and then if we use remove the "music." part, here's the YT link https://www.youtube.com/playlist?list=OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ and the screenshot

image

Both screenshots have different titles, but what I see in the YT Music, it has localized title (or at least translated). As far as I know as today, YT Music support in yt-dlp is still in enhancement (as in https://github.com/orgs/yt-dlp/projects/3/views/1 especially #622) so I suppose it's partially working with some data not being extracted properly. What I aware so far is that using music.youtube.com redirects to youtube.com as I tried. And another info, YT Music can be a bit messy which is indicated by number of views of a song (we can see at both images, but this one will be a different topic). I use my previous code based on #10782 and devscripts/cli_to_api.py. here's the code I've been developing so far:

import yt_dlp
import pandas as pd
import tqdm

# Define the YouTube playlist URL
# playlist_url = "https://www.youtube.com/playlist?list=OLAK5uy_m9CTHZbTGo5EPIi7SRmM3PB0gC0KiBIro"
# playlist_url = "https://www.youtube.com/playlist?list=OLAK5uy_napmWn2Jc-y11uiS_g-A6rH8SkfYxb8yU"
playlist_url = "https://music.youtube.com/playlist?list=OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ"

# ydl_opts = {
#     'quiet': True,
#     'extract_flat': True  # This ensures we get the metadata only, without downloading the videos
# }

ydl_opts ={
    'extract_flat': True,
    'extractor_args': {'youtube': {'lang': ['en']}},
    'final_ext': 'm4a',
    'format': 'bestaudio',
    'fragment_retries': 10,
    'ignoreerrors': 'only_download',
    'outtmpl': {'default': '%(title)s.%(ext)s'},
    'postprocessors': [{'key': 'FFmpegExtractAudio',
                        'nopostoverwrites': False,
                        'preferredcodec': 'm4a',
                        'preferredquality': '0'},
                        {'add_chapters': True,
                        'add_infojson': 'if_exists',
                        'add_metadata': True,
                        'key': 'FFmpegMetadata'},
                        {'key': 'FFmpegConcat',
                        'only_multi_video': True,
                        'when': 'playlist'}],
    'retries': 10,
    'subtitleslangs': ['en'],
    'verbose': True,
    'quiet': True,
    'writesubtitles': True
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    playlist_dict = ydl.extract_info(playlist_url, download=False)

    info_list = []
    for entry in tqdm.tqdm(playlist_dict['entries']):
        video_url = entry['url']
        video_data = ydl.extract_info(video_url, download=False)
        # print(video_data)
        title = video_data.get('title', 'Unknown')
        alt_title = video_data.get('alt_title', 'Unknown')
        channel = info_dict.get('channel', 'Unknown')
        artist = video_data.get('artist', video_data.get('uploader', 'Unknown'))
        album = video_data.get('album', 'Unknown')
        duration = video_data.get('duration', 'Unknown')
        format = video_data.get('format', 'Unknown')
        filesize = video_data.get('filesize', 'Unknown')

        info_list.append({
            'title': title,
            'alt_title': alt_title,
            'channel': channel,
            'artist': artist,
            'album': album,
            'duration': duration,
            'format': format,
            'filesize': filesize
        })

# Convert to DataFrame
df = pd.DataFrame(info_list)

# Save to CSV
df.to_csv('playlist_info.csv', index=False)

print("Data has been successfully extracted and saved to playlist_info.csv")

and here's the csv output sample

image

As we can see, the title follows the YT desktop, which is expected that the music link is redirected. So, back to the topic, how do I extract the localized titles from YT music?

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.10.22 from yt-dlp/yt-dlp [67adeb7ba] (pip) API
[debug] params: {'extract_flat': True, 'extractor_args': {'youtube': {'lang': ['en']}}, 'final_ext': 'm4a', 'format': 'bestaudio', 'fragment_retries': 10, 'ignoreerrors': 'only_download', 'outtmpl': {'default': '%(title)s.%(ext)s'}, 'postprocessors': [{'key': 'FFmpegExtractAudio', 'nopostoverwrites': False, 'preferredcodec': 'm4a', 'preferredquality': '0'}, {'add_chapters': True, 'add_infojson': 'if_exists', 'add_metadata': True, 'key': 'FFmpegMetadata'}, {'key': 'FFmpegConcat', 'only_multi_video': True, 'when': 'playlist'}], 'retries': 10, 'subtitleslangs': ['en'], 'verbose': True, 'quiet': True, 'writesubtitles': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.15 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.21.0, brotli-1.1.0, certifi-2024.08.30, mutagen-1.47.0, requests-2.32.3, secretstorage-3.3.1, sqlite3-3.37.2, urllib3-2.2.3, websockets-13.1
[debug] Proxy map: {'colab_language_server': '/usr/colab/bin/language_service'}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1839 extractors
[youtube:tab] Extracting URL: https://music.youtube.com/playlist?list=OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ
WARNING: [youtube:tab] YouTube Music is not directly supported. Redirecting to https://www.youtube.com/playlist?list=OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ
[youtube:tab] OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ: Downloading webpage
[youtube:tab] OLAK5uy_lSbsoiec9atxAdI3oRuwqRD7aueQospLQ: Redownloading playlist API JSON with unavailable videos
[download] Downloading playlist: Album - Night walk
[youtube:tab] Playlist Album - Night walk: Downloading 10 items of 10
[debug] The information of all playlist entries will be held in memory
[download] Downloading item 1 of 10
[download] Downloading item 2 of 10
[download] Downloading item 3 of 10
[download] Downloading item 4 of 10
[download] Downloading item 5 of 10
[download] Downloading item 6 of 10
[download] Downloading item 7 of 10
[download] Downloading item 8 of 10
[download] Downloading item 9 of 10
[download] Downloading item 10 of 10
[download] Finished downloading playlist: Album - Night walk
  0%|          | 0/10 [00:00<?, ?it/s][youtube] Extracting URL: https://music.youtube.com/watch?v=9xfzIheFxlo
[youtube] 9xfzIheFxlo: Downloading webpage
[youtube] 9xfzIheFxlo: Downloading ios player API JSON
[youtube] 9xfzIheFxlo: Downloading mweb player API JSON
[youtube] 9xfzIheFxlo: Downloading ios music player API JSON
[debug] [youtube] Extracting signature function js_606a66b3_101
[debug] Loading youtube-sigfuncs.js_606a66b3_101 from cache
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig 6freeYigDAFYpoz => XtNyO8BoMeX2tw
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig BMFX_n3OgkcxtP3 => AWnr7OlE7BKYDw
[debug] [youtube] Extracting signature function js_606a66b3_105
[debug] Loading youtube-sigfuncs.js_606a66b3_105 from cache
[youtube] 9xfzIheFxlo: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 10%|█         | 1/10 [00:02<00:22,  2.55s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=fq_g7oNa5Lc
[youtube] fq_g7oNa5Lc: Downloading webpage
[youtube] fq_g7oNa5Lc: Downloading ios player API JSON
[youtube] fq_g7oNa5Lc: Downloading mweb player API JSON
[youtube] fq_g7oNa5Lc: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig uKXBmhF19LaFPEK => kA5pqYhXdmQ_Sw
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig xNxyfWXYVMIeuGG => h9TsgLGTd0AKNQ
[youtube] fq_g7oNa5Lc: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 20%|██        | 2/10 [00:05<00:19,  2.49s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=fNfI-UCL8dE
[youtube] fNfI-UCL8dE: Downloading webpage
[youtube] fNfI-UCL8dE: Downloading ios player API JSON
[youtube] fNfI-UCL8dE: Downloading mweb player API JSON
[youtube] fNfI-UCL8dE: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig YjbpybK6xCG0e1Y => i0xCRNDBnSFAzw
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig utA8bIcKRPxEenW => xjiMk4DzV0l2QQ
[youtube] fNfI-UCL8dE: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 30%|███       | 3/10 [00:08<00:19,  2.82s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=pUWtT8BwQZM
[youtube] pUWtT8BwQZM: Downloading webpage
[youtube] pUWtT8BwQZM: Downloading ios player API JSON
[youtube] pUWtT8BwQZM: Downloading mweb player API JSON
[youtube] pUWtT8BwQZM: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig mnZe55dsiRI6cLC => 7tXG8ZqPXJSy2w
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig TBpVKxr7Ug6Gt-_ => DULgvUxMNTeJVQ
[youtube] pUWtT8BwQZM: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 40%|████      | 4/10 [00:10<00:16,  2.68s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=GvbxeOwNOng
[youtube] GvbxeOwNOng: Downloading webpage
[youtube] GvbxeOwNOng: Downloading ios player API JSON
[youtube] GvbxeOwNOng: Downloading mweb player API JSON
[youtube] GvbxeOwNOng: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig fty3VriCXrS88sg => 5eUMukq56CBysQ
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig JIbf5zv3cSfZmJK => _uxnn_nXPTyx1w
[youtube] GvbxeOwNOng: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 50%|█████     | 5/10 [00:13<00:12,  2.56s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=_sQYKoory1w
[youtube] _sQYKoory1w: Downloading webpage
[youtube] _sQYKoory1w: Downloading ios player API JSON
[youtube] _sQYKoory1w: Downloading mweb player API JSON
[youtube] _sQYKoory1w: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig _b37_QryWrky4Jr => Mi96yzhg8DCMog
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig Sa8KP6K5pSFbRb1 => iJe5gpFCYKDJRQ
[youtube] _sQYKoory1w: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 60%|██████    | 6/10 [00:15<00:10,  2.56s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=SMf8D620LRk
[youtube] SMf8D620LRk: Downloading webpage
[youtube] SMf8D620LRk: Downloading ios player API JSON
[youtube] SMf8D620LRk: Downloading mweb player API JSON
[youtube] SMf8D620LRk: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig EoOhiUGb1OpRYrj => rwYHW6d8mMdrZQ
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig c7KzyB2uGPrOtKK => UasaDZOXgLdyoQ
[youtube] SMf8D620LRk: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 70%|███████   | 7/10 [00:18<00:07,  2.53s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=FTjitn225Mo
[youtube] FTjitn225Mo: Downloading webpage
[youtube] FTjitn225Mo: Downloading ios player API JSON
[youtube] FTjitn225Mo: Downloading mweb player API JSON
[youtube] FTjitn225Mo: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig PfE9B28q_-3oqgZ => UkmyTc4qXg_eDQ
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig 4Tff1oYlmc_EKqE => 1uB-sLERv9LHpQ
[youtube] FTjitn225Mo: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 80%|████████  | 8/10 [00:21<00:05,  2.88s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=0VEWb2dcicI
[youtube] 0VEWb2dcicI: Downloading webpage
[youtube] 0VEWb2dcicI: Downloading ios player API JSON
[youtube] 0VEWb2dcicI: Downloading mweb player API JSON
[youtube] 0VEWb2dcicI: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig KG8EuH1s3BhtQSp => xDel6aReTqiwEg
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig StXmIugZUzicux- => CZ5MqLuLsVh7Aw
[youtube] 0VEWb2dcicI: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
 90%|█████████ | 9/10 [00:24<00:02,  2.73s/it][youtube] Extracting URL: https://music.youtube.com/watch?v=iBgvIFBK1t8
[youtube] iBgvIFBK1t8: Downloading webpage
[youtube] iBgvIFBK1t8: Downloading ios player API JSON
[youtube] iBgvIFBK1t8: Downloading mweb player API JSON
[youtube] iBgvIFBK1t8: Downloading ios music player API JSON
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig 7OFUbtr-a9PhOOl => 6TntZDdaoDgwkA
[debug] Loading youtube-nsig.606a66b3 from cache
[debug] [youtube] Decrypted nsig OroBtdjp7OGsimN => oAKK0e7-Oc6tTA
[youtube] iBgvIFBK1t8: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
100%|██████████| 10/10 [00:26<00:00,  2.64s/it]Data has been successfully extracted and saved to playlist_info.csv
bashonly commented 16 hours ago

Not currently possible. See #622

DWSuryo commented 10 hours ago

Okay. I've mentioned the same issue number too. Perhaps the workaround is to scrape directly using web scraping programs? Of course, this would be something outside yt-dlp so it could be off-topic.