[kuwo] rewrite the outdated kuwo extractor

grqz commented 1 month ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

[X] I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

[X] I'm reporting that yt-dlp is broken on a supported site
[X] I've verified that I have updated yt-dlp to nightly or master (update instructions)
[X] I've checked that all provided URLs are playable in a browser with the same IP and same login details
[X] I've checked that all URLs and arguments with special characters are properly quoted or escaped
[X] I've searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
[X] I've read the guidelines for opening an issue
[ ] I've read about sharing account credentials and I'm willing to share it if required

Region

China

Provide a description that is worded well enough to be understood

[!NOTE] [geo-blocked] you can NOT access anything on kuwo outside china, it gives HTTP 500

Example URL

http://kuwo.cn/play_detail/28115171

Description

the current code is 8 years old. it needs an update

I have a VPN to debug though I'm not in china. I have no kuwo account so please ask someone else if necessary.

the verbose output below from the latest nightly shows that:

the _VALID_URL needs an update(it's processed by the genericie)
it's geo-blocked outside china(see the response code)

Related info

Related to pr: #7470

EDIT: there may be easier way to extract than changing the API, the current code fails on metadata extraction. http://antiserver.kuwo.cn/anti.s is still usable with some param changes

since the URL path has changed, _VALID_URL should be updated.

playurl API URL: http://www.kuwo.cn/api/v1/www/music/playUrl query: {'mid': song_id}

not sure how to extract a specific format, the API just gives 128k.

required headers in API request:

Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324 in Cookies
Secret the Secret header seems to be related to Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324 value. but it has nothing to do with the query param(mid) haven't tested if they'll expire yet.

EDIT: the header Secret can be obtained by the func below with: f(<unescaped Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324 value>, "Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324")

function f(t, e) {
  if (null == e || e.length <= 0) return null;
  for (var n = "", i = 0; i < e.length; i++) n += e.charCodeAt(i).toString();
  var o = Math.floor(n.length / 5),
    r = parseInt(
      n.charAt(o) +
        n.charAt(2 * o) +
        n.charAt(3 * o) +
        n.charAt(4 * o) +
        n.charAt(5 * o)
    ),
    c = Math.ceil(e.length / 2),
    l = Math.pow(2, 31) - 1;
  if (r < 2) return null;
  var d = Math.round(1e9 * Math.random()) % 1e8;
  for (n += d; n.length > 10; )
    n = (
      parseInt(n.substring(0, 10)) + parseInt(n.substring(10, n.length))
    ).toString();
  n = (r * n + c) % l;
  var f = "",
    h = "";
  for (i = 0; i < t.length; i++)
    (h +=
      (f = parseInt(t.charCodeAt(i) ^ Math.floor((n / l) * 255))) < 16
        ? "0" + f.toString(16)
        : f.toString(16)),
      (n = (r * n + c) % l);
  for (d = d.toString(16); d.length < 8; ) d = "0" + d;
  return (h += d);
}

python algorithm:

import random
import math

def f(t_cookie_val, e_cookie_key):
    def charAt(lst, idx):
        try:
            return lst[idx]
        except BaseException:
            return ''

    def parseInt(value: str):
        if isinstance(value, int):
            return value
        num_str = ''
        for char in value:
            if char.isdigit():
                num_str += char
            else:
                break
        return int(num_str) if num_str else None

    def toString(value):
        if len(str(value)) > 20:
            return f'{value:.16e}'
        return str(value)

    if e_cookie_key is None or len(e_cookie_key) <= 0:
        return None
    n = ''
    for i in e_cookie_key:
        n += str(ord(i))

    o = len(n) // 5
    r = int(''.join(charAt(n, i * o) for i in range(1, 6)))
    c = len(e_cookie_key) // 2 + 1
    l = pow(2, 31) - 1

    if r < 2:
        return None
    d = random.randint(0, int(1e8 - 1))

    n += str(d)
    while len(n) > 10:
        n = toString(parseInt(n[:10]) + parseInt(n[10:]))
    n = (r * int(n) + c) % l

    f = ''
    h = ''
    i = 0
    while i < len(t_cookie_val):
        f = parseInt(ord(t_cookie_val[i]) ^ math.floor((n / l) * 255))
        if f < 16:
            h += '0'
        h += hex(f)[2:]
        n = (r * n + c) % l
        i += 1
    d = hex(d)[2:]
    while len(d) < 8:
        d = '0' + d
    h += d
    return h

print(f('AnMG6YKJNGDkSmm2TY7xJMmw67XFiCmN', 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324'))

Provide verbose output that clearly demonstrates the problem

[X] Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
[ ] If using API, add 'verbose': True to YoutubeDL params instead
[X] Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', 'http://kuwo.cn/play_detail/28115171']
[debug] Encodings: locale cp936, fs utf-8, pref cp936, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version nightly@2024.08.06.232802 from yt-dlp/yt-dlp-nightly-builds [a06508664] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.19045-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg N-115821-g0060a368b1-20240613 (setts), ffprobe N-115821-g0060a368b1-20240613
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.07.04, curl_cffi-0.5.10, mutagen-1.47.0, requests-2.32.3, sqlite3-3.35.5, urllib3-2.2.2, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets, curl_cffi
[debug] Loaded 1830 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
Latest version: nightly@2024.08.06.232802 from yt-dlp/yt-dlp-nightly-builds
yt-dlp is up to date (nightly@2024.08.06.232802 from yt-dlp/yt-dlp-nightly-builds)
[generic] Extracting URL: http://kuwo.cn/play_detail/28115171
[generic] 28115171: Downloading webpage
ERROR: [generic] Unable to download webpage: HTTP Error 500: OK (caused by <HTTPError 500: OK>)
  File "yt_dlp\extractor\common.py", line 740, in extract
  File "yt_dlp\extractor\generic.py", line 2384, in _real_extract
  File "yt_dlp\extractor\common.py", line 909, in _request_webpage

  File "yt_dlp\extractor\common.py", line 896, in _request_webpage
  File "yt_dlp\YoutubeDL.py", line 4165, in urlopen
  File "yt_dlp\networking\common.py", line 117, in send
  File "yt_dlp\networking\_helper.py", line 208, in wrapper
  File "yt_dlp\networking\common.py", line 340, in send
  File "yt_dlp\networking\_requests.py", line 365, in _send
yt_dlp.networking.exceptions.HTTPError: HTTP Error 500: OK

grqz commented 1 month ago

Unfortunately, the http://antiserver.kuwo.cn/anti.s API cannot extract the lossless format now. By capturing packets, I found an API with no params related to the music's format, it just returns the 128kbps mp3 and requires a signature the site says that the lossless format is only playable on their client. I don't plan to capture packets from the client

grqz commented 1 month ago

After some testing, it is possible to bypass the geo-block. even though the webpage returns HTTP/1.1 500 OK it still returns the Set-Cookie header which is enough. None of the APIs are geo-blocked it's just that we can't extract metadata from the webpage EDIT: metadata extraction from webpage may be removed then the Secret header calculation is inevitable since the metadata API needs it

grqz commented 1 month ago

After some work:

Logs (in geo-blocked region)

```log [debug] Command-line config: ['--exec', 'sha1sum', '--force-overwrite', '-v', '--write-info-json', '--write-subs', '--sub-langs', 'zh.*', '--write-comments', '--convert-subs', 'vtt', '-F', '--no-simulate', 'http://www.kuwo.cn/play_detail/282270399'] [debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2024.07.16 from yt-dlp/yt-dlp [89a161e8c] (source) [debug] Lazy loading extractors is disabled [debug] Git HEAD: a3bab4752 [debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35) [debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2 [debug] Optional libraries: certifi-2024.06.02, curl_cffi-0.7.1, requests-2.32.3, secretstorage-3.3.1, sqlite3-3.37.2, urllib3-2.2.2 [debug] Proxy map: {} [debug] Request Handlers: urllib, requests, curl_cffi [debug] Loaded 1829 extractors [kuwo:song] Extracting URL: http://www.kuwo.cn/play_detail/282270399 [kuwo:song] 282270399: Downloading webpage WARNING: [kuwo:song] Unable to download webpage: HTTP Error 500: OK [kuwo:song] 282270399: Downloading lyrics [kuwo:song] 282270399: Downloading JSON metadata [kuwo:song] 282270399: Getting new_mp3 play url [kuwo:song] 282270399: Getting mp3 play url [kuwo:song] 282270399: Getting wma play url [kuwo:song] 282270399: Getting aac play url [info] 282270399: Downloading subtitles: zh-CN [debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id [kuwo:song] 282270399: Downloading recommended comments [kuwo:song] 282270399: Downloading comments from page 1/Unknown [kuwo:song] 282270399: Downloading comments from page 2/191 [kuwo:song] 282270399: Downloading comments from page 3/191 [kuwo:song] 282270399: Downloading comments from page 4/191 [kuwo:song] 282270399: Downloading comments from page 5/191 [kuwo:song] 282270399: Downloading comments from page 6/191 [kuwo:song] 282270399: Downloading comments from page 7/191 [kuwo:song] 282270399: Downloading comments from page 8/191 [kuwo:song] 282270399: Downloading comments from page 9/191 [kuwo:song] 282270399: Downloading comments from page 10/191 [kuwo:song] 282270399: Downloading comments from page 11/191 [kuwo:song] 282270399: Downloading comments from page 12/191 [kuwo:song] 282270399: Downloading comments from page 13/191 [kuwo:song] 282270399: Downloading comments from page 14/191 [kuwo:song] 282270399: Downloading comments from page 15/191 [kuwo:song] 282270399: Downloading comments from page 16/191 [kuwo:song] 282270399: Downloading comments from page 17/191 [kuwo:song] 282270399: Downloading comments from page 18/191 [kuwo:song] 282270399: Downloading comments from page 19/191 [kuwo:song] 282270399: Downloading comments from page 20/191 [kuwo:song] 282270399: Downloading comments from page 21/191 [kuwo:song] 282270399: Downloading comments from page 22/191 [kuwo:song] 282270399: Downloading comments from page 23/191 [kuwo:song] 282270399: Downloading comments from page 24/191 [kuwo:song] 282270399: Downloading comments from page 25/191 [kuwo:song] 282270399: Downloading comments from page 26/191 [kuwo:song] 282270399: Downloading comments from page 27/191 [kuwo:song] 282270399: Downloading comments from page 28/191 [kuwo:song] 282270399: Downloading comments from page 29/191 [kuwo:song] 282270399: Downloading comments from page 30/191 [kuwo:song] 282270399: Downloading comments from page 31/191 [kuwo:song] 282270399: Downloading comments from page 32/191 [kuwo:song] 282270399: Downloading comments from page 33/191 [kuwo:song] 282270399: Downloading comments from page 34/191 [kuwo:song] 282270399: Downloading comments from page 35/191 [kuwo:song] 282270399: Downloading comments from page 36/191 [kuwo:song] 282270399: Downloading comments from page 37/191 [kuwo:song] 282270399: Downloading comments from page 38/191 [kuwo:song] 282270399: Downloading comments from page 39/191 [kuwo:song] 282270399: Downloading comments from page 40/191 [kuwo:song] 282270399: Downloading comments from page 41/191 [kuwo:song] 282270399: Downloading comments from page 42/191 [kuwo:song] 282270399: Downloading comments from page 43/191 [kuwo:song] 282270399: Downloading comments from page 44/191 [kuwo:song] 282270399: Downloading comments from page 45/191 [kuwo:song] 282270399: Downloading comments from page 46/191 [kuwo:song] 282270399: Downloading comments from page 47/191 [kuwo:song] 282270399: Downloading comments from page 48/191 [kuwo:song] 282270399: Downloading comments from page 49/191 [kuwo:song] 282270399: Downloading comments from page 50/191 [kuwo:song] 282270399: Downloading comments from page 51/191 [kuwo:song] 282270399: Downloading comments from page 52/191 [kuwo:song] 282270399: Downloading comments from page 53/191 [kuwo:song] 282270399: Downloading comments from page 54/191 [kuwo:song] 282270399: Downloading comments from page 55/191 [kuwo:song] 282270399: Downloading comments from page 56/191 [kuwo:song] 282270399: Downloading comments from page 57/191 [kuwo:song] 282270399: Downloading comments from page 58/191 [kuwo:song] 282270399: Downloading comments from page 59/191 [kuwo:song] 282270399: Downloading comments from page 60/191 [kuwo:song] 282270399: Downloading comments from page 61/191 [kuwo:song] 282270399: Downloading comments from page 62/191 [kuwo:song] 282270399: Downloading comments from page 63/191 [kuwo:song] 282270399: Downloading comments from page 64/191 [kuwo:song] 282270399: Downloading comments from page 65/191 [kuwo:song] 282270399: Downloading comments from page 66/191 [kuwo:song] 282270399: Downloading comments from page 67/191 [kuwo:song] 282270399: Downloading comments from page 68/191 [kuwo:song] 282270399: Downloading comments from page 69/191 [kuwo:song] 282270399: Downloading comments from page 70/191 [kuwo:song] 282270399: Downloading comments from page 71/191 [kuwo:song] 282270399: Downloading comments from page 72/191 [kuwo:song] 282270399: Downloading comments from page 73/191 [kuwo:song] 282270399: Downloading comments from page 74/191 [kuwo:song] 282270399: Downloading comments from page 75/191 [kuwo:song] 282270399: Downloading comments from page 76/191 [kuwo:song] 282270399: Downloading comments from page 77/191 [kuwo:song] 282270399: Downloading comments from page 78/191 [kuwo:song] 282270399: Downloading comments from page 79/191 [kuwo:song] 282270399: Downloading comments from page 80/191 [kuwo:song] 282270399: Downloading comments from page 81/191 [kuwo:song] 282270399: Downloading comments from page 82/191 [kuwo:song] 282270399: Downloading comments from page 83/191 [kuwo:song] 282270399: Downloading comments from page 84/191 [kuwo:song] 282270399: Downloading comments from page 85/191 [kuwo:song] 282270399: Downloading comments from page 86/191 [kuwo:song] 282270399: Downloading comments from page 87/191 [kuwo:song] 282270399: Downloading comments from page 88/191 [kuwo:song] 282270399: Downloading comments from page 89/191 [kuwo:song] 282270399: Downloading comments from page 90/191 [kuwo:song] 282270399: Downloading comments from page 91/191 [kuwo:song] 282270399: Downloading comments from page 92/191 [kuwo:song] 282270399: Downloading comments from page 93/191 [kuwo:song] 282270399: Downloading comments from page 94/191 [kuwo:song] 282270399: Downloading comments from page 95/191 [kuwo:song] 282270399: Downloading comments from page 96/191 [kuwo:song] 282270399: Downloading comments from page 97/191 [kuwo:song] 282270399: Downloading comments from page 98/191 [kuwo:song] 282270399: Downloading comments from page 99/191 [kuwo:song] 282270399: Downloading comments from page 100/191 [kuwo:song] 282270399: Downloading comments from page 101/191 [kuwo:song] 282270399: Downloading comments from page 102/191 [kuwo:song] 282270399: Downloading comments from page 103/191 [kuwo:song] 282270399: Downloading comments from page 104/191 [kuwo:song] 282270399: Downloading comments from page 105/191 [kuwo:song] 282270399: Downloading comments from page 106/191 [kuwo:song] 282270399: Downloading comments from page 107/191 [kuwo:song] 282270399: Downloading comments from page 108/191 [kuwo:song] 282270399: Downloading comments from page 109/191 [kuwo:song] 282270399: Downloading comments from page 110/191 [kuwo:song] 282270399: Downloading comments from page 111/191 [kuwo:song] 282270399: Downloading comments from page 112/191 [kuwo:song] 282270399: Downloading comments from page 113/191 [kuwo:song] 282270399: Downloading comments from page 114/191 [kuwo:song] 282270399: Downloading comments from page 115/191 [kuwo:song] 282270399: Downloading comments from page 116/191 [kuwo:song] 282270399: Downloading comments from page 117/191 [kuwo:song] 282270399: Downloading comments from page 118/191 [kuwo:song] 282270399: Downloading comments from page 119/191 [kuwo:song] 282270399: Downloading comments from page 120/191 [kuwo:song] 282270399: Downloading comments from page 121/191 [kuwo:song] 282270399: Downloading comments from page 122/191 [kuwo:song] 282270399: Downloading comments from page 123/191 [kuwo:song] 282270399: Downloading comments from page 124/191 [kuwo:song] 282270399: Downloading comments from page 125/191 [kuwo:song] 282270399: Downloading comments from page 126/191 [kuwo:song] 282270399: Downloading comments from page 127/191 [kuwo:song] 282270399: Downloading comments from page 128/191 [kuwo:song] 282270399: Downloading comments from page 129/191 [kuwo:song] 282270399: Downloading comments from page 130/191 [kuwo:song] 282270399: Downloading comments from page 131/191 [kuwo:song] 282270399: Downloading comments from page 132/191 [kuwo:song] 282270399: Downloading comments from page 133/191 [kuwo:song] 282270399: Downloading comments from page 134/191 [kuwo:song] 282270399: Downloading comments from page 135/191 [kuwo:song] 282270399: Downloading comments from page 136/191 [kuwo:song] 282270399: Downloading comments from page 137/191 [kuwo:song] 282270399: Downloading comments from page 138/191 [kuwo:song] 282270399: Downloading comments from page 139/191 [kuwo:song] 282270399: Downloading comments from page 140/191 [kuwo:song] 282270399: Downloading comments from page 141/191 [kuwo:song] 282270399: Downloading comments from page 142/191 [kuwo:song] 282270399: Downloading comments from page 143/191 [kuwo:song] 282270399: Downloading comments from page 144/191 [kuwo:song] 282270399: Downloading comments from page 145/191 [kuwo:song] 282270399: Downloading comments from page 146/191 [kuwo:song] 282270399: Downloading comments from page 147/191 [kuwo:song] 282270399: Downloading comments from page 148/191 [kuwo:song] 282270399: Downloading comments from page 149/191 [kuwo:song] 282270399: Downloading comments from page 150/191 [kuwo:song] 282270399: Downloading comments from page 151/191 [kuwo:song] 282270399: Downloading comments from page 152/191 [kuwo:song] 282270399: Downloading comments from page 153/191 [kuwo:song] 282270399: Downloading comments from page 154/191 [kuwo:song] 282270399: Downloading comments from page 155/191 [kuwo:song] 282270399: Downloading comments from page 156/191 [kuwo:song] 282270399: Downloading comments from page 157/191 [kuwo:song] 282270399: Downloading comments from page 158/191 [kuwo:song] 282270399: Downloading comments from page 159/191 [kuwo:song] 282270399: Downloading comments from page 160/191 [kuwo:song] 282270399: Downloading comments from page 161/191 [kuwo:song] 282270399: Downloading comments from page 162/191 [kuwo:song] 282270399: Downloading comments from page 163/191 [kuwo:song] 282270399: Downloading comments from page 164/191 [kuwo:song] 282270399: Downloading comments from page 165/191 [kuwo:song] 282270399: Downloading comments from page 166/191 [kuwo:song] 282270399: Downloading comments from page 167/191 [kuwo:song] 282270399: Downloading comments from page 168/191 [kuwo:song] 282270399: Downloading comments from page 169/191 [kuwo:song] 282270399: Downloading comments from page 170/191 [kuwo:song] 282270399: Downloading comments from page 171/191 [kuwo:song] 282270399: Downloading comments from page 172/191 [kuwo:song] 282270399: Downloading comments from page 173/191 [kuwo:song] 282270399: Downloading comments from page 174/191 [kuwo:song] 282270399: Downloading comments from page 175/191 [kuwo:song] 282270399: Downloading comments from page 176/191 [kuwo:song] 282270399: Downloading comments from page 177/191 [kuwo:song] 282270399: Downloading comments from page 178/191 [kuwo:song] 282270399: Downloading comments from page 179/191 [kuwo:song] 282270399: Downloading comments from page 180/191 [kuwo:song] 282270399: Downloading comments from page 181/191 [kuwo:song] 282270399: Downloading comments from page 182/191 [kuwo:song] 282270399: Downloading comments from page 183/191 [kuwo:song] 282270399: Downloading comments from page 184/191 [kuwo:song] 282270399: Downloading comments from page 185/191 [kuwo:song] 282270399: Downloading comments from page 186/191 [kuwo:song] 282270399: Downloading comments from page 187/191 [kuwo:song] 282270399: Downloading comments from page 188/191 [kuwo:song] 282270399: Downloading comments from page 189/191 [kuwo:song] 282270399: Downloading comments from page 190/191 [kuwo:song] 282270399: Downloading comments from page 191/191 [kuwo:song] Extracted 5700 comments [info] Available formats for 282270399: ID EXT RESOLUTION │ FILESIZE TBR PROTO │ VCODEC ACODEC ABR MORE INFO ───────────────────────────────────────────────────────────────────────────────────────────────────────── aac aac audio only │ ~972.66KiB 48k http │ audio only aac 48k wma wma audio only │ ~ 1.90MiB 96k http │ audio only wma 96k mp3 mp3 audio only │ ~ 2.53MiB 128k http │ audio only mp3 128k new_mp3 mp3 audio only │ ~ 2.53MiB 128k https │ audio only mp3 128k mp3 extracted from the latest API [debug] Default format spec: bestvideo*+bestaudio/best [info] 282270399: Downloading 1 format(s): new_mp3 [info] Writing video subtitles to: 一笑江湖(DJ弹鼓版) [282270399].zh-CN.lrc [info] Writing video metadata as JSON to: 一笑江湖(DJ弹鼓版) [282270399].info.json [SubtitlesConvertor] Converting subtitles [debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:一笑江湖(DJ弹鼓版) [282270399].zh-CN.lrc' -f webvtt -movflags +faststart 'file:一笑江湖(DJ弹鼓版) [282270399].zh-CN.vtt' Deleting original file 一笑江湖(DJ弹鼓版) [282270399].zh-CN.lrc (pass -k to keep) Deleting existing file 一笑江湖(DJ弹鼓版) [282270399].mp3 [debug] Invoking http downloader on "https://ll-sycdn.kuwo.cn/50cefe1c20be88337330ea4951b6a1b1/66b8bfbb/resource/n1/21/79/3447204738.mp3" [download] Destination: 一笑江湖(DJ弹鼓版) [282270399].mp3 [download] 100% of 2.53MiB in 00:00:03 at 710.27KiB/s [Exec] Executing command: sha1sum '/home/user/yt-dlp_dev/yt-dlp/一笑江湖(DJ弹鼓版) [282270399].mp3' fa782498ebef2a55a5ca9a1ec44dd98b8cf4d1d2 /home/user/yt-dlp_dev/yt-dlp/一笑江湖(DJ弹鼓版) [282270399].mp3 ```

explanation:

supported mp3 from the new API
deleted ape format as it can no longer be extracted
fix lyrics extraction
add comments downloader
add more metadata fields

diff patch: 1 file changed, 252 insertions(+), 63 deletions(-)

```diff diff --git a/yt_dlp/extractor/kuwo.py b/yt_dlp/extractor/kuwo.py index 80b6b55f1..785fe2204 100644 --- a/yt_dlp/extractor/kuwo.py +++ b/yt_dlp/extractor/kuwo.py @@ -1,3 +1,6 @@ +import itertools +import math +import random import re import urllib.parse @@ -8,56 +11,221 @@ clean_html, get_element_by_id, remove_start, + traverse_obj, + url_or_none, ) class KuwoBaseIE(InfoExtractor): _FORMATS = [ - {'format': 'ape', 'ext': 'ape', 'preference': 100}, - {'format': 'mp3-320', 'ext': 'mp3', 'br': '320kmp3', 'abr': 320, 'preference': 80}, - {'format': 'mp3-192', 'ext': 'mp3', 'br': '192kmp3', 'abr': 192, 'preference': 70}, - {'format': 'mp3-128', 'ext': 'mp3', 'br': '128kmp3', 'abr': 128, 'preference': 60}, - {'format': 'wma', 'ext': 'wma', 'preference': 20}, - {'format': 'aac', 'ext': 'aac', 'abr': 48, 'preference': 10}, + # {'format': 'ape_old', 'acodec': 'ape', 'quality': 100}, # broken + {'format': 'new_mp3', 'acodec': 'mp3', 'api':True, 'abr': 128, 'quality': 40}, + {'format': 'mp3', 'acodec': 'mp3', 'abr': 128, 'quality': 30}, + {'format': 'wma', 'acodec': 'wma', 'abr': 96, 'quality': 20}, + {'format': 'aac', 'acodec': 'aac', 'abr': 48, 'quality': 10}, ] def _get_formats(self, song_id, tolerate_ip_deny=False): formats = [] + initial_cookie_val = self._get_cookies('http://www.kuwo.cn/play_detail/').get('Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324').value + header_secret = self._calc_secret(initial_cookie_val, 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324') for file_format in self._FORMATS: - query = { - 'format': file_format['ext'], - 'br': file_format.get('br', ''), - 'rid': f'MUSIC_{song_id}', - 'type': 'convert_url', - 'response': 'url', - } - - song_url = self._download_webpage( - 'http://antiserver.kuwo.cn/anti.s', - song_id, note='Download {} url info'.format(file_format['format']), - query=query, headers=self.geo_verification_headers(), - ) - - if song_url == 'IPDeny' and not tolerate_ip_deny: - raise ExtractorError('This song is blocked in this region', expected=True) - - if song_url.startswith(('http://', 'https://')): + if file_format.get('api'): + song_dict = self._download_json( + 'http://www.kuwo.cn/api/v1/www/music/playUrl', + query={ + 'mid': song_id, + 'type': 'music', + }, headers={ + **self.geo_verification_headers(), + 'Secret': header_secret, + },note=f'Getting {file_format["format"]} play url', video_id=song_id) + + song_url = url_or_none(traverse_obj(song_dict, ('data', 'url'))) + else: + query = { + 'format': file_format['acodec'], + 'rid': song_id, + 'type': 'convert_url', + 'response': 'url', + } + + song_url = url_or_none(self._download_webpage( + 'http://antiserver.kuwo.cn/anti.s', + song_id, note=f'Getting {file_format["format"]} play url', + query=query, headers=self.geo_verification_headers(), + )) + + if song_url == 'IPDeny' and not tolerate_ip_deny: + raise ExtractorError('This song is blocked in this region', expected=True) + + self._set_cookie('.kuwo.cn', 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324', initial_cookie_val) + + if song_url: formats.append({ 'url': song_url, - 'format_id': file_format['format'], 'format': file_format['format'], - 'quality': file_format['preference'], + 'format_id': file_format['format'], + 'format_note': 'mp3 extracted from the latest API' if file_format.get('api') else None, + 'quality': file_format['quality'], 'abr': file_format.get('abr'), + 'acodec': file_format.get('acodec'), + 'vcodec': 'none', }) return formats + def _get_comments(self, song_id): + """This API is very slow! It may take several seconds to respond one request!""" + hot_comments = self._download_json( + 'https://comment.kuwo.cn/com.s', song_id, + 'Downloading recommended comments', + 'Failed to download recommended comments', + fatal=False, query={ + 'type': 'get_rec_comment', + 'f': 'web', + 'page': 1, + 'rows': 5, + 'digest': 15, + 'sid': song_id, + 'uid': 0, + }).get('rows') + if not hot_comments: + return + for hot_comment in hot_comments: + yield { + 'author': hot_comment.get('u_name'), + 'author_id': hot_comment.get('u_id'), + 'author_thumbnail': hot_comment.get('u_pic'), + 'text': hot_comment.get('msg'), + 'is_pinned': True, + } + + total_pages = 'Unknown' + for page_num in itertools.count(1): + comments = self._download_json( + 'https://comment.kuwo.cn/com.s', song_id, + f'Downloading comments from page {page_num}/{total_pages}', + f'Failed to download comments from page {page_num}/{total_pages}', + fatal=False, query={ + 'type': 'get_comment', + 'f': 'web', + 'page': page_num, + 'rows': 30, + 'digest': 15, + 'sid': song_id, + 'uid': 0, + }) + if not comments: + return + if page_num == 1: + total_pages = comments.get('totalPage') + for comments_row in comments.get('rows'): + yield { + 'author': comments_row.get('u_name'), + 'author_id': comments_row.get('u_id'), + 'author_thumbnail': comments_row.get('u_pic'), + 'text': comments_row.get('msg'), + } + if page_num + 1 > total_pages: + return + + def _get_metadata(self, song_id): + metadata = {} + initial_cookie_val = self._get_cookies('http://www.kuwo.cn/play_detail/').get('Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324').value + header_secret = self._calc_secret(initial_cookie_val, 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324') + metadata = self._download_json( + 'http://www.kuwo.cn/api/www/music/musicInfo', song_id, + query={'mid': song_id}, headers={ + **self.geo_verification_headers(), + 'Secret': header_secret, + }, fatal=False) + self._set_cookie('.kuwo.cn', 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324', initial_cookie_val) + return metadata + + def _get_subtitles(self, song_id): + lyrics_list = traverse_obj(self._download_json( + 'http://www.kuwo.cn/openapi/v1/www/lyric/getlyric', song_id, + 'Downloading lyrics', 'Failed to download lyrics', fatal=False, + query={'musicId': song_id}), ('data', 'lrclist')) + zh_subtitles = [] + if lyrics_list: + lrc = '' + for line_lyric in lyrics_list: + seconds = float(line_lyric['time']) + minutes = int(seconds // 60) + seconds_remainder = seconds % 60 + time = f'{minutes:02}:{seconds_remainder:05.2f}' + lrc += f'[{time}]{line_lyric["lineLyric"]}\n' + zh_subtitles.append({'ext': 'lrc', 'data': lrc}) + return {'zh-CN': zh_subtitles} + + def _calc_secret(self, t_cookie_val, e_cookie_key): + """Calculates `Secret` header for several API. + From https://h5s.kuwo.cn/www/kw-www/ca3a6c0.js function f(t, e)""" + def charAt(lst, idx): + try: + return lst[idx] + except BaseException: + return '' + + def parseInt(value: str): + if isinstance(value, int): + return value + num_str = '' + for char in value: + if char.isdigit(): + num_str += char + else: + break + return int(num_str) if num_str else None + + def toString(value): + if len(str(value)) > 20: + return f'{value:.16e}' + return str(value) + + if e_cookie_key is None or len(e_cookie_key) <= 0: + return None + n = '' + for i in e_cookie_key: + n += str(ord(i)) + + o = len(n) // 5 + r = int(''.join(charAt(n, i * o) for i in range(1, 6))) + c = len(e_cookie_key) // 2 + 1 + l = pow(2, 31) - 1 + + if r < 2: + return None + d = random.randint(0, int(1e8 - 1)) + + n += str(d) + while len(n) > 10: + n = toString(parseInt(n[:10]) + parseInt(n[10:])) + n = (r * int(n) + c) % l + + f = '' + h = '' + i = 0 + while i < len(t_cookie_val): + f = parseInt(ord(t_cookie_val[i]) ^ math.floor((n / l) * 255)) + if f < 16: + h += '0' + h += hex(f)[2:] + n = (r * n + c) % l + i += 1 + d = hex(d)[2:] + while len(d) < 8: + d = '0' + d + h += d + return h + class KuwoIE(KuwoBaseIE): - _WORKING = False IE_NAME = 'kuwo:song' IE_DESC = '酷我音乐' - _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/yinyue/(?P\d+)' + _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/play_detail/(?P\d+)' _TESTS = [{ 'url': 'http://www.kuwo.cn/yinyue/635632/', 'info_dict': { @@ -88,48 +256,69 @@ class KuwoIE(KuwoBaseIE): }] def _real_extract(self, url): + headers = self.geo_verification_headers() song_id = self._match_id(url) - webpage, urlh = self._download_webpage_handle( - url, song_id, note='Download song detail info', - errnote='Unable to get song detail info') - if song_id not in urlh.url or '对不起，该歌曲由于版权问题已被下线，将返回网站首页' in webpage: - raise ExtractorError('this song has been offline because of copyright issues', expected=True) - - song_name = self._html_search_regex( - r']+id="lrcName">([^<]+)

', webpage, 'song name') - singer_name = remove_start(self._html_search_regex( - r']+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">', - webpage, 'singer name', fatal=False), '歌手') - lrc_content = clean_html(get_element_by_id('lrcContent', webpage)) - if lrc_content == '暂无': # indicates no lyrics - lrc_content = None - - formats = self._get_formats(song_id) - - album_id = self._html_search_regex( - r']+href="http://www\.kuwo\.cn/album/(\d+)/"', - webpage, 'album id', fatal=False) - publish_time = None - if album_id is not None: - album_info_page = self._download_webpage( - f'http://www.kuwo.cn/album/{album_id}/', song_id, - note='Download album detail info', - errnote='Unable to get album detail info') + _ = self._download_webpage(url, song_id, headers=headers, fatal=False) # get cookies + if not self._get_cookies('http://www.kuwo.cn/play_detail/').get('Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324'): + raise ExtractorError('Failed to get cookies from the webpage!', video_id=song_id) + subtitles = self._get_subtitles(song_id) + metadata = self._get_metadata(song_id) + # comments = self._get_comments(song_id) + # if metadata.get('msg') != 'success' and webpage: + # self.report_warning('metadata API failed, falling back to webpage', song_id) + # # window.__NUXT__.data[0].songinfo + # self._search_nextjs_data() + # self._search_json( + # r']+id=[\'"]__NUXT__[\'"][^>]*>', webpage, 'next.js data', + # song_id, end_pattern='') + # song_name = self._html_search_regex( + # r''',\s*?name\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'song name', group='name') + + # singer_name = self._html_search_regex( + # r''',\s*?artist\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'artist', fatal=False, group='artist') + + # album_name = self._html_search_regex( + # r''',\s*?album\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'album', fatal=False, group='album') + + # release_date = self._html_search_regex( + # r''',\s*?releaseDate\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'release date', fatal=False, group='releaseDate') + # duration = None + # track_num = None + # thumbnail = None + # else: + song_info = metadata['data'] + + song_name = song_info.get('name') + singer_name = song_info.get('artist') + album_name = song_info.get('album') + release_date = song_info.get('releaseDate') + duration = song_info.get('duration') + track_num = song_info.get('track') + thumbnail = song_info.get('pic') + + if release_date is not None: + release_date = release_date.replace('-', '') - publish_time = self._html_search_regex( - r'发行时间：(\d{4}-\d{2}-\d{2})', album_info_page, - 'publish time', fatal=False) - if publish_time: - publish_time = publish_time.replace('-', '') + formats = self._get_formats(song_id) return { + 'formats': formats, 'id': song_id, 'title': song_name, - 'creator': singer_name, - 'upload_date': publish_time, - 'description': lrc_content, - 'formats': formats, + 'thumbnail': thumbnail, + 'release_date': release_date, + 'subtitles': subtitles, + 'duration': duration, + 'track': song_name, + 'track_number': track_num, + 'artists': [singer_name], + 'album': album_name, + '__post_extractor': self.extract_comments(song_id), } ```

haven't tested whether several other IEs are working. I'll probably open a pr later

yt-dlp / yt-dlp