yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
82.36k stars 6.42k forks source link

[kuwo] rewrite the outdated kuwo extractor #10688

Open grqz opened 1 month ago

grqz commented 1 month ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

China

Provide a description that is worded well enough to be understood

[!NOTE] [geo-blocked] you can NOT access anything on kuwo outside china, it gives HTTP 500

Example URL

http://kuwo.cn/play_detail/28115171

Description

the current code is 8 years old. it needs an update

I have a VPN to debug though I'm not in china. I have no kuwo account so please ask someone else if necessary.

the verbose output below from the latest nightly shows that:

  1. the _VALID_URL needs an update(it's processed by the genericie)
  2. it's geo-blocked outside china(see the response code)

Related info

Related to pr: #7470

EDIT: there may be easier way to extract than changing the API, the current code fails on metadata extraction. http://antiserver.kuwo.cn/anti.s is still usable with some param changes

since the URL path has changed, _VALID_URL should be updated.

playurl API URL: http://www.kuwo.cn/api/v1/www/music/playUrl query: {'mid': song_id}

not sure how to extract a specific format, the API just gives 128k.

required headers in API request:

EDIT: the header Secret can be obtained by the func below with: f(<unescaped Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324 value>, "Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324")

function f(t, e) {
  if (null == e || e.length <= 0) return null;
  for (var n = "", i = 0; i < e.length; i++) n += e.charCodeAt(i).toString();
  var o = Math.floor(n.length / 5),
    r = parseInt(
      n.charAt(o) +
        n.charAt(2 * o) +
        n.charAt(3 * o) +
        n.charAt(4 * o) +
        n.charAt(5 * o)
    ),
    c = Math.ceil(e.length / 2),
    l = Math.pow(2, 31) - 1;
  if (r < 2) return null;
  var d = Math.round(1e9 * Math.random()) % 1e8;
  for (n += d; n.length > 10; )
    n = (
      parseInt(n.substring(0, 10)) + parseInt(n.substring(10, n.length))
    ).toString();
  n = (r * n + c) % l;
  var f = "",
    h = "";
  for (i = 0; i < t.length; i++)
    (h +=
      (f = parseInt(t.charCodeAt(i) ^ Math.floor((n / l) * 255))) < 16
        ? "0" + f.toString(16)
        : f.toString(16)),
      (n = (r * n + c) % l);
  for (d = d.toString(16); d.length < 8; ) d = "0" + d;
  return (h += d);
}

python algorithm:

import random
import math

def f(t_cookie_val, e_cookie_key):
    def charAt(lst, idx):
        try:
            return lst[idx]
        except BaseException:
            return ''

    def parseInt(value: str):
        if isinstance(value, int):
            return value
        num_str = ''
        for char in value:
            if char.isdigit():
                num_str += char
            else:
                break
        return int(num_str) if num_str else None

    def toString(value):
        if len(str(value)) > 20:
            return f'{value:.16e}'
        return str(value)

    if e_cookie_key is None or len(e_cookie_key) <= 0:
        return None
    n = ''
    for i in e_cookie_key:
        n += str(ord(i))

    o = len(n) // 5
    r = int(''.join(charAt(n, i * o) for i in range(1, 6)))
    c = len(e_cookie_key) // 2 + 1
    l = pow(2, 31) - 1

    if r < 2:
        return None
    d = random.randint(0, int(1e8 - 1))

    n += str(d)
    while len(n) > 10:
        n = toString(parseInt(n[:10]) + parseInt(n[10:]))
    n = (r * int(n) + c) % l

    f = ''
    h = ''
    i = 0
    while i < len(t_cookie_val):
        f = parseInt(ord(t_cookie_val[i]) ^ math.floor((n / l) * 255))
        if f < 16:
            h += '0'
        h += hex(f)[2:]
        n = (r * n + c) % l
        i += 1
    d = hex(d)[2:]
    while len(d) < 8:
        d = '0' + d
    h += d
    return h

print(f('AnMG6YKJNGDkSmm2TY7xJMmw67XFiCmN', 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324'))

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

[debug] Command-line config: ['-vU', 'http://kuwo.cn/play_detail/28115171']
[debug] Encodings: locale cp936, fs utf-8, pref cp936, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version nightly@2024.08.06.232802 from yt-dlp/yt-dlp-nightly-builds [a06508664] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.19045-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg N-115821-g0060a368b1-20240613 (setts), ffprobe N-115821-g0060a368b1-20240613
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.07.04, curl_cffi-0.5.10, mutagen-1.47.0, requests-2.32.3, sqlite3-3.35.5, urllib3-2.2.2, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets, curl_cffi
[debug] Loaded 1830 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
Latest version: nightly@2024.08.06.232802 from yt-dlp/yt-dlp-nightly-builds
yt-dlp is up to date (nightly@2024.08.06.232802 from yt-dlp/yt-dlp-nightly-builds)
[generic] Extracting URL: http://kuwo.cn/play_detail/28115171
[generic] 28115171: Downloading webpage
ERROR: [generic] Unable to download webpage: HTTP Error 500: OK (caused by <HTTPError 500: OK>)
  File "yt_dlp\extractor\common.py", line 740, in extract
  File "yt_dlp\extractor\generic.py", line 2384, in _real_extract
  File "yt_dlp\extractor\common.py", line 909, in _request_webpage

  File "yt_dlp\extractor\common.py", line 896, in _request_webpage
  File "yt_dlp\YoutubeDL.py", line 4165, in urlopen
  File "yt_dlp\networking\common.py", line 117, in send
  File "yt_dlp\networking\_helper.py", line 208, in wrapper
  File "yt_dlp\networking\common.py", line 340, in send
  File "yt_dlp\networking\_requests.py", line 365, in _send
yt_dlp.networking.exceptions.HTTPError: HTTP Error 500: OK
grqz commented 1 month ago

Unfortunately, the http://antiserver.kuwo.cn/anti.s API cannot extract the lossless format now. By capturing packets, I found an API with no params related to the music's format, it just returns the 128kbps mp3 and requires a signature the site says that the lossless format is only playable on their client. I don't plan to capture packets from the client

grqz commented 1 month ago

After some testing, it is possible to bypass the geo-block. even though the webpage returns HTTP/1.1 500 OK it still returns the Set-Cookie header which is enough. None of the APIs are geo-blocked it's just that we can't extract metadata from the webpage EDIT: metadata extraction from webpage may be removed then the Secret header calculation is inevitable since the metadata API needs it

grqz commented 1 month ago

After some work:

Logs (in geo-blocked region) ```log [debug] Command-line config: ['--exec', 'sha1sum', '--force-overwrite', '-v', '--write-info-json', '--write-subs', '--sub-langs', 'zh.*', '--write-comments', '--convert-subs', 'vtt', '-F', '--no-simulate', 'http://www.kuwo.cn/play_detail/282270399'] [debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2024.07.16 from yt-dlp/yt-dlp [89a161e8c] (source) [debug] Lazy loading extractors is disabled [debug] Git HEAD: a3bab4752 [debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35) [debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2 [debug] Optional libraries: certifi-2024.06.02, curl_cffi-0.7.1, requests-2.32.3, secretstorage-3.3.1, sqlite3-3.37.2, urllib3-2.2.2 [debug] Proxy map: {} [debug] Request Handlers: urllib, requests, curl_cffi [debug] Loaded 1829 extractors [kuwo:song] Extracting URL: http://www.kuwo.cn/play_detail/282270399 [kuwo:song] 282270399: Downloading webpage WARNING: [kuwo:song] Unable to download webpage: HTTP Error 500: OK [kuwo:song] 282270399: Downloading lyrics [kuwo:song] 282270399: Downloading JSON metadata [kuwo:song] 282270399: Getting new_mp3 play url [kuwo:song] 282270399: Getting mp3 play url [kuwo:song] 282270399: Getting wma play url [kuwo:song] 282270399: Getting aac play url [info] 282270399: Downloading subtitles: zh-CN [debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id [kuwo:song] 282270399: Downloading recommended comments [kuwo:song] 282270399: Downloading comments from page 1/Unknown [kuwo:song] 282270399: Downloading comments from page 2/191 [kuwo:song] 282270399: Downloading comments from page 3/191 [kuwo:song] 282270399: Downloading comments from page 4/191 [kuwo:song] 282270399: Downloading comments from page 5/191 [kuwo:song] 282270399: Downloading comments from page 6/191 [kuwo:song] 282270399: Downloading comments from page 7/191 [kuwo:song] 282270399: Downloading comments from page 8/191 [kuwo:song] 282270399: Downloading comments from page 9/191 [kuwo:song] 282270399: Downloading comments from page 10/191 [kuwo:song] 282270399: Downloading comments from page 11/191 [kuwo:song] 282270399: Downloading comments from page 12/191 [kuwo:song] 282270399: Downloading comments from page 13/191 [kuwo:song] 282270399: Downloading comments from page 14/191 [kuwo:song] 282270399: Downloading comments from page 15/191 [kuwo:song] 282270399: Downloading comments from page 16/191 [kuwo:song] 282270399: Downloading comments from page 17/191 [kuwo:song] 282270399: Downloading comments from page 18/191 [kuwo:song] 282270399: Downloading comments from page 19/191 [kuwo:song] 282270399: Downloading comments from page 20/191 [kuwo:song] 282270399: Downloading comments from page 21/191 [kuwo:song] 282270399: Downloading comments from page 22/191 [kuwo:song] 282270399: Downloading comments from page 23/191 [kuwo:song] 282270399: Downloading comments from page 24/191 [kuwo:song] 282270399: Downloading comments from page 25/191 [kuwo:song] 282270399: Downloading comments from page 26/191 [kuwo:song] 282270399: Downloading comments from page 27/191 [kuwo:song] 282270399: Downloading comments from page 28/191 [kuwo:song] 282270399: Downloading comments from page 29/191 [kuwo:song] 282270399: Downloading comments from page 30/191 [kuwo:song] 282270399: Downloading comments from page 31/191 [kuwo:song] 282270399: Downloading comments from page 32/191 [kuwo:song] 282270399: Downloading comments from page 33/191 [kuwo:song] 282270399: Downloading comments from page 34/191 [kuwo:song] 282270399: Downloading comments from page 35/191 [kuwo:song] 282270399: Downloading comments from page 36/191 [kuwo:song] 282270399: Downloading comments from page 37/191 [kuwo:song] 282270399: Downloading comments from page 38/191 [kuwo:song] 282270399: Downloading comments from page 39/191 [kuwo:song] 282270399: Downloading comments from page 40/191 [kuwo:song] 282270399: Downloading comments from page 41/191 [kuwo:song] 282270399: Downloading comments from page 42/191 [kuwo:song] 282270399: Downloading comments from page 43/191 [kuwo:song] 282270399: Downloading comments from page 44/191 [kuwo:song] 282270399: Downloading comments from page 45/191 [kuwo:song] 282270399: Downloading comments from page 46/191 [kuwo:song] 282270399: Downloading comments from page 47/191 [kuwo:song] 282270399: Downloading comments from page 48/191 [kuwo:song] 282270399: Downloading comments from page 49/191 [kuwo:song] 282270399: Downloading comments from page 50/191 [kuwo:song] 282270399: Downloading comments from page 51/191 [kuwo:song] 282270399: Downloading comments from page 52/191 [kuwo:song] 282270399: Downloading comments from page 53/191 [kuwo:song] 282270399: Downloading comments from page 54/191 [kuwo:song] 282270399: Downloading comments from page 55/191 [kuwo:song] 282270399: Downloading comments from page 56/191 [kuwo:song] 282270399: Downloading comments from page 57/191 [kuwo:song] 282270399: Downloading comments from page 58/191 [kuwo:song] 282270399: Downloading comments from page 59/191 [kuwo:song] 282270399: Downloading comments from page 60/191 [kuwo:song] 282270399: Downloading comments from page 61/191 [kuwo:song] 282270399: Downloading comments from page 62/191 [kuwo:song] 282270399: Downloading comments from page 63/191 [kuwo:song] 282270399: Downloading comments from page 64/191 [kuwo:song] 282270399: Downloading comments from page 65/191 [kuwo:song] 282270399: Downloading comments from page 66/191 [kuwo:song] 282270399: Downloading comments from page 67/191 [kuwo:song] 282270399: Downloading comments from page 68/191 [kuwo:song] 282270399: Downloading comments from page 69/191 [kuwo:song] 282270399: Downloading comments from page 70/191 [kuwo:song] 282270399: Downloading comments from page 71/191 [kuwo:song] 282270399: Downloading comments from page 72/191 [kuwo:song] 282270399: Downloading comments from page 73/191 [kuwo:song] 282270399: Downloading comments from page 74/191 [kuwo:song] 282270399: Downloading comments from page 75/191 [kuwo:song] 282270399: Downloading comments from page 76/191 [kuwo:song] 282270399: Downloading comments from page 77/191 [kuwo:song] 282270399: Downloading comments from page 78/191 [kuwo:song] 282270399: Downloading comments from page 79/191 [kuwo:song] 282270399: Downloading comments from page 80/191 [kuwo:song] 282270399: Downloading comments from page 81/191 [kuwo:song] 282270399: Downloading comments from page 82/191 [kuwo:song] 282270399: Downloading comments from page 83/191 [kuwo:song] 282270399: Downloading comments from page 84/191 [kuwo:song] 282270399: Downloading comments from page 85/191 [kuwo:song] 282270399: Downloading comments from page 86/191 [kuwo:song] 282270399: Downloading comments from page 87/191 [kuwo:song] 282270399: Downloading comments from page 88/191 [kuwo:song] 282270399: Downloading comments from page 89/191 [kuwo:song] 282270399: Downloading comments from page 90/191 [kuwo:song] 282270399: Downloading comments from page 91/191 [kuwo:song] 282270399: Downloading comments from page 92/191 [kuwo:song] 282270399: Downloading comments from page 93/191 [kuwo:song] 282270399: Downloading comments from page 94/191 [kuwo:song] 282270399: Downloading comments from page 95/191 [kuwo:song] 282270399: Downloading comments from page 96/191 [kuwo:song] 282270399: Downloading comments from page 97/191 [kuwo:song] 282270399: Downloading comments from page 98/191 [kuwo:song] 282270399: Downloading comments from page 99/191 [kuwo:song] 282270399: Downloading comments from page 100/191 [kuwo:song] 282270399: Downloading comments from page 101/191 [kuwo:song] 282270399: Downloading comments from page 102/191 [kuwo:song] 282270399: Downloading comments from page 103/191 [kuwo:song] 282270399: Downloading comments from page 104/191 [kuwo:song] 282270399: Downloading comments from page 105/191 [kuwo:song] 282270399: Downloading comments from page 106/191 [kuwo:song] 282270399: Downloading comments from page 107/191 [kuwo:song] 282270399: Downloading comments from page 108/191 [kuwo:song] 282270399: Downloading comments from page 109/191 [kuwo:song] 282270399: Downloading comments from page 110/191 [kuwo:song] 282270399: Downloading comments from page 111/191 [kuwo:song] 282270399: Downloading comments from page 112/191 [kuwo:song] 282270399: Downloading comments from page 113/191 [kuwo:song] 282270399: Downloading comments from page 114/191 [kuwo:song] 282270399: Downloading comments from page 115/191 [kuwo:song] 282270399: Downloading comments from page 116/191 [kuwo:song] 282270399: Downloading comments from page 117/191 [kuwo:song] 282270399: Downloading comments from page 118/191 [kuwo:song] 282270399: Downloading comments from page 119/191 [kuwo:song] 282270399: Downloading comments from page 120/191 [kuwo:song] 282270399: Downloading comments from page 121/191 [kuwo:song] 282270399: Downloading comments from page 122/191 [kuwo:song] 282270399: Downloading comments from page 123/191 [kuwo:song] 282270399: Downloading comments from page 124/191 [kuwo:song] 282270399: Downloading comments from page 125/191 [kuwo:song] 282270399: Downloading comments from page 126/191 [kuwo:song] 282270399: Downloading comments from page 127/191 [kuwo:song] 282270399: Downloading comments from page 128/191 [kuwo:song] 282270399: Downloading comments from page 129/191 [kuwo:song] 282270399: Downloading comments from page 130/191 [kuwo:song] 282270399: Downloading comments from page 131/191 [kuwo:song] 282270399: Downloading comments from page 132/191 [kuwo:song] 282270399: Downloading comments from page 133/191 [kuwo:song] 282270399: Downloading comments from page 134/191 [kuwo:song] 282270399: Downloading comments from page 135/191 [kuwo:song] 282270399: Downloading comments from page 136/191 [kuwo:song] 282270399: Downloading comments from page 137/191 [kuwo:song] 282270399: Downloading comments from page 138/191 [kuwo:song] 282270399: Downloading comments from page 139/191 [kuwo:song] 282270399: Downloading comments from page 140/191 [kuwo:song] 282270399: Downloading comments from page 141/191 [kuwo:song] 282270399: Downloading comments from page 142/191 [kuwo:song] 282270399: Downloading comments from page 143/191 [kuwo:song] 282270399: Downloading comments from page 144/191 [kuwo:song] 282270399: Downloading comments from page 145/191 [kuwo:song] 282270399: Downloading comments from page 146/191 [kuwo:song] 282270399: Downloading comments from page 147/191 [kuwo:song] 282270399: Downloading comments from page 148/191 [kuwo:song] 282270399: Downloading comments from page 149/191 [kuwo:song] 282270399: Downloading comments from page 150/191 [kuwo:song] 282270399: Downloading comments from page 151/191 [kuwo:song] 282270399: Downloading comments from page 152/191 [kuwo:song] 282270399: Downloading comments from page 153/191 [kuwo:song] 282270399: Downloading comments from page 154/191 [kuwo:song] 282270399: Downloading comments from page 155/191 [kuwo:song] 282270399: Downloading comments from page 156/191 [kuwo:song] 282270399: Downloading comments from page 157/191 [kuwo:song] 282270399: Downloading comments from page 158/191 [kuwo:song] 282270399: Downloading comments from page 159/191 [kuwo:song] 282270399: Downloading comments from page 160/191 [kuwo:song] 282270399: Downloading comments from page 161/191 [kuwo:song] 282270399: Downloading comments from page 162/191 [kuwo:song] 282270399: Downloading comments from page 163/191 [kuwo:song] 282270399: Downloading comments from page 164/191 [kuwo:song] 282270399: Downloading comments from page 165/191 [kuwo:song] 282270399: Downloading comments from page 166/191 [kuwo:song] 282270399: Downloading comments from page 167/191 [kuwo:song] 282270399: Downloading comments from page 168/191 [kuwo:song] 282270399: Downloading comments from page 169/191 [kuwo:song] 282270399: Downloading comments from page 170/191 [kuwo:song] 282270399: Downloading comments from page 171/191 [kuwo:song] 282270399: Downloading comments from page 172/191 [kuwo:song] 282270399: Downloading comments from page 173/191 [kuwo:song] 282270399: Downloading comments from page 174/191 [kuwo:song] 282270399: Downloading comments from page 175/191 [kuwo:song] 282270399: Downloading comments from page 176/191 [kuwo:song] 282270399: Downloading comments from page 177/191 [kuwo:song] 282270399: Downloading comments from page 178/191 [kuwo:song] 282270399: Downloading comments from page 179/191 [kuwo:song] 282270399: Downloading comments from page 180/191 [kuwo:song] 282270399: Downloading comments from page 181/191 [kuwo:song] 282270399: Downloading comments from page 182/191 [kuwo:song] 282270399: Downloading comments from page 183/191 [kuwo:song] 282270399: Downloading comments from page 184/191 [kuwo:song] 282270399: Downloading comments from page 185/191 [kuwo:song] 282270399: Downloading comments from page 186/191 [kuwo:song] 282270399: Downloading comments from page 187/191 [kuwo:song] 282270399: Downloading comments from page 188/191 [kuwo:song] 282270399: Downloading comments from page 189/191 [kuwo:song] 282270399: Downloading comments from page 190/191 [kuwo:song] 282270399: Downloading comments from page 191/191 [kuwo:song] Extracted 5700 comments [info] Available formats for 282270399: ID EXT RESOLUTION │ FILESIZE TBR PROTO │ VCODEC ACODEC ABR MORE INFO ───────────────────────────────────────────────────────────────────────────────────────────────────────── aac aac audio only │ ~972.66KiB 48k http │ audio only aac 48k wma wma audio only │ ~ 1.90MiB 96k http │ audio only wma 96k mp3 mp3 audio only │ ~ 2.53MiB 128k http │ audio only mp3 128k new_mp3 mp3 audio only │ ~ 2.53MiB 128k https │ audio only mp3 128k mp3 extracted from the latest API [debug] Default format spec: bestvideo*+bestaudio/best [info] 282270399: Downloading 1 format(s): new_mp3 [info] Writing video subtitles to: 一笑江湖(DJ弹鼓版) [282270399].zh-CN.lrc [info] Writing video metadata as JSON to: 一笑江湖(DJ弹鼓版) [282270399].info.json [SubtitlesConvertor] Converting subtitles [debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:一笑江湖(DJ弹鼓版) [282270399].zh-CN.lrc' -f webvtt -movflags +faststart 'file:一笑江湖(DJ弹鼓版) [282270399].zh-CN.vtt' Deleting original file 一笑江湖(DJ弹鼓版) [282270399].zh-CN.lrc (pass -k to keep) Deleting existing file 一笑江湖(DJ弹鼓版) [282270399].mp3 [debug] Invoking http downloader on "https://ll-sycdn.kuwo.cn/50cefe1c20be88337330ea4951b6a1b1/66b8bfbb/resource/n1/21/79/3447204738.mp3" [download] Destination: 一笑江湖(DJ弹鼓版) [282270399].mp3 [download] 100% of 2.53MiB in 00:00:03 at 710.27KiB/s [Exec] Executing command: sha1sum '/home/user/yt-dlp_dev/yt-dlp/一笑江湖(DJ弹鼓版) [282270399].mp3' fa782498ebef2a55a5ca9a1ec44dd98b8cf4d1d2 /home/user/yt-dlp_dev/yt-dlp/一笑江湖(DJ弹鼓版) [282270399].mp3 ```

explanation:

  1. supported mp3 from the new API
  2. deleted ape format as it can no longer be extracted
  3. fix lyrics extraction
  4. add comments downloader
  5. add more metadata fields
diff patch: 1 file changed, 252 insertions(+), 63 deletions(-) ```diff diff --git a/yt_dlp/extractor/kuwo.py b/yt_dlp/extractor/kuwo.py index 80b6b55f1..785fe2204 100644 --- a/yt_dlp/extractor/kuwo.py +++ b/yt_dlp/extractor/kuwo.py @@ -1,3 +1,6 @@ +import itertools +import math +import random import re import urllib.parse @@ -8,56 +11,221 @@ clean_html, get_element_by_id, remove_start, + traverse_obj, + url_or_none, ) class KuwoBaseIE(InfoExtractor): _FORMATS = [ - {'format': 'ape', 'ext': 'ape', 'preference': 100}, - {'format': 'mp3-320', 'ext': 'mp3', 'br': '320kmp3', 'abr': 320, 'preference': 80}, - {'format': 'mp3-192', 'ext': 'mp3', 'br': '192kmp3', 'abr': 192, 'preference': 70}, - {'format': 'mp3-128', 'ext': 'mp3', 'br': '128kmp3', 'abr': 128, 'preference': 60}, - {'format': 'wma', 'ext': 'wma', 'preference': 20}, - {'format': 'aac', 'ext': 'aac', 'abr': 48, 'preference': 10}, + # {'format': 'ape_old', 'acodec': 'ape', 'quality': 100}, # broken + {'format': 'new_mp3', 'acodec': 'mp3', 'api':True, 'abr': 128, 'quality': 40}, + {'format': 'mp3', 'acodec': 'mp3', 'abr': 128, 'quality': 30}, + {'format': 'wma', 'acodec': 'wma', 'abr': 96, 'quality': 20}, + {'format': 'aac', 'acodec': 'aac', 'abr': 48, 'quality': 10}, ] def _get_formats(self, song_id, tolerate_ip_deny=False): formats = [] + initial_cookie_val = self._get_cookies('http://www.kuwo.cn/play_detail/').get('Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324').value + header_secret = self._calc_secret(initial_cookie_val, 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324') for file_format in self._FORMATS: - query = { - 'format': file_format['ext'], - 'br': file_format.get('br', ''), - 'rid': f'MUSIC_{song_id}', - 'type': 'convert_url', - 'response': 'url', - } - - song_url = self._download_webpage( - 'http://antiserver.kuwo.cn/anti.s', - song_id, note='Download {} url info'.format(file_format['format']), - query=query, headers=self.geo_verification_headers(), - ) - - if song_url == 'IPDeny' and not tolerate_ip_deny: - raise ExtractorError('This song is blocked in this region', expected=True) - - if song_url.startswith(('http://', 'https://')): + if file_format.get('api'): + song_dict = self._download_json( + 'http://www.kuwo.cn/api/v1/www/music/playUrl', + query={ + 'mid': song_id, + 'type': 'music', + }, headers={ + **self.geo_verification_headers(), + 'Secret': header_secret, + },note=f'Getting {file_format["format"]} play url', video_id=song_id) + + song_url = url_or_none(traverse_obj(song_dict, ('data', 'url'))) + else: + query = { + 'format': file_format['acodec'], + 'rid': song_id, + 'type': 'convert_url', + 'response': 'url', + } + + song_url = url_or_none(self._download_webpage( + 'http://antiserver.kuwo.cn/anti.s', + song_id, note=f'Getting {file_format["format"]} play url', + query=query, headers=self.geo_verification_headers(), + )) + + if song_url == 'IPDeny' and not tolerate_ip_deny: + raise ExtractorError('This song is blocked in this region', expected=True) + + self._set_cookie('.kuwo.cn', 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324', initial_cookie_val) + + if song_url: formats.append({ 'url': song_url, - 'format_id': file_format['format'], 'format': file_format['format'], - 'quality': file_format['preference'], + 'format_id': file_format['format'], + 'format_note': 'mp3 extracted from the latest API' if file_format.get('api') else None, + 'quality': file_format['quality'], 'abr': file_format.get('abr'), + 'acodec': file_format.get('acodec'), + 'vcodec': 'none', }) return formats + def _get_comments(self, song_id): + """This API is very slow! It may take several seconds to respond one request!""" + hot_comments = self._download_json( + 'https://comment.kuwo.cn/com.s', song_id, + 'Downloading recommended comments', + 'Failed to download recommended comments', + fatal=False, query={ + 'type': 'get_rec_comment', + 'f': 'web', + 'page': 1, + 'rows': 5, + 'digest': 15, + 'sid': song_id, + 'uid': 0, + }).get('rows') + if not hot_comments: + return + for hot_comment in hot_comments: + yield { + 'author': hot_comment.get('u_name'), + 'author_id': hot_comment.get('u_id'), + 'author_thumbnail': hot_comment.get('u_pic'), + 'text': hot_comment.get('msg'), + 'is_pinned': True, + } + + total_pages = 'Unknown' + for page_num in itertools.count(1): + comments = self._download_json( + 'https://comment.kuwo.cn/com.s', song_id, + f'Downloading comments from page {page_num}/{total_pages}', + f'Failed to download comments from page {page_num}/{total_pages}', + fatal=False, query={ + 'type': 'get_comment', + 'f': 'web', + 'page': page_num, + 'rows': 30, + 'digest': 15, + 'sid': song_id, + 'uid': 0, + }) + if not comments: + return + if page_num == 1: + total_pages = comments.get('totalPage') + for comments_row in comments.get('rows'): + yield { + 'author': comments_row.get('u_name'), + 'author_id': comments_row.get('u_id'), + 'author_thumbnail': comments_row.get('u_pic'), + 'text': comments_row.get('msg'), + } + if page_num + 1 > total_pages: + return + + def _get_metadata(self, song_id): + metadata = {} + initial_cookie_val = self._get_cookies('http://www.kuwo.cn/play_detail/').get('Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324').value + header_secret = self._calc_secret(initial_cookie_val, 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324') + metadata = self._download_json( + 'http://www.kuwo.cn/api/www/music/musicInfo', song_id, + query={'mid': song_id}, headers={ + **self.geo_verification_headers(), + 'Secret': header_secret, + }, fatal=False) + self._set_cookie('.kuwo.cn', 'Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324', initial_cookie_val) + return metadata + + def _get_subtitles(self, song_id): + lyrics_list = traverse_obj(self._download_json( + 'http://www.kuwo.cn/openapi/v1/www/lyric/getlyric', song_id, + 'Downloading lyrics', 'Failed to download lyrics', fatal=False, + query={'musicId': song_id}), ('data', 'lrclist')) + zh_subtitles = [] + if lyrics_list: + lrc = '' + for line_lyric in lyrics_list: + seconds = float(line_lyric['time']) + minutes = int(seconds // 60) + seconds_remainder = seconds % 60 + time = f'{minutes:02}:{seconds_remainder:05.2f}' + lrc += f'[{time}]{line_lyric["lineLyric"]}\n' + zh_subtitles.append({'ext': 'lrc', 'data': lrc}) + return {'zh-CN': zh_subtitles} + + def _calc_secret(self, t_cookie_val, e_cookie_key): + """Calculates `Secret` header for several API. + From https://h5s.kuwo.cn/www/kw-www/ca3a6c0.js function f(t, e)""" + def charAt(lst, idx): + try: + return lst[idx] + except BaseException: + return '' + + def parseInt(value: str): + if isinstance(value, int): + return value + num_str = '' + for char in value: + if char.isdigit(): + num_str += char + else: + break + return int(num_str) if num_str else None + + def toString(value): + if len(str(value)) > 20: + return f'{value:.16e}' + return str(value) + + if e_cookie_key is None or len(e_cookie_key) <= 0: + return None + n = '' + for i in e_cookie_key: + n += str(ord(i)) + + o = len(n) // 5 + r = int(''.join(charAt(n, i * o) for i in range(1, 6))) + c = len(e_cookie_key) // 2 + 1 + l = pow(2, 31) - 1 + + if r < 2: + return None + d = random.randint(0, int(1e8 - 1)) + + n += str(d) + while len(n) > 10: + n = toString(parseInt(n[:10]) + parseInt(n[10:])) + n = (r * int(n) + c) % l + + f = '' + h = '' + i = 0 + while i < len(t_cookie_val): + f = parseInt(ord(t_cookie_val[i]) ^ math.floor((n / l) * 255)) + if f < 16: + h += '0' + h += hex(f)[2:] + n = (r * n + c) % l + i += 1 + d = hex(d)[2:] + while len(d) < 8: + d = '0' + d + h += d + return h + class KuwoIE(KuwoBaseIE): - _WORKING = False IE_NAME = 'kuwo:song' IE_DESC = '酷我音乐' - _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/yinyue/(?P\d+)' + _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/play_detail/(?P\d+)' _TESTS = [{ 'url': 'http://www.kuwo.cn/yinyue/635632/', 'info_dict': { @@ -88,48 +256,69 @@ class KuwoIE(KuwoBaseIE): }] def _real_extract(self, url): + headers = self.geo_verification_headers() song_id = self._match_id(url) - webpage, urlh = self._download_webpage_handle( - url, song_id, note='Download song detail info', - errnote='Unable to get song detail info') - if song_id not in urlh.url or '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage: - raise ExtractorError('this song has been offline because of copyright issues', expected=True) - - song_name = self._html_search_regex( - r']+id="lrcName">([^<]+)

', webpage, 'song name') - singer_name = remove_start(self._html_search_regex( - r']+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">', - webpage, 'singer name', fatal=False), '歌手') - lrc_content = clean_html(get_element_by_id('lrcContent', webpage)) - if lrc_content == '暂无': # indicates no lyrics - lrc_content = None - - formats = self._get_formats(song_id) - - album_id = self._html_search_regex( - r']+href="http://www\.kuwo\.cn/album/(\d+)/"', - webpage, 'album id', fatal=False) - publish_time = None - if album_id is not None: - album_info_page = self._download_webpage( - f'http://www.kuwo.cn/album/{album_id}/', song_id, - note='Download album detail info', - errnote='Unable to get album detail info') + _ = self._download_webpage(url, song_id, headers=headers, fatal=False) # get cookies + if not self._get_cookies('http://www.kuwo.cn/play_detail/').get('Hm_Iuvt_cdb524f42f23cer9b268564v7y735ewrq2324'): + raise ExtractorError('Failed to get cookies from the webpage!', video_id=song_id) + subtitles = self._get_subtitles(song_id) + metadata = self._get_metadata(song_id) + # comments = self._get_comments(song_id) + # if metadata.get('msg') != 'success' and webpage: + # self.report_warning('metadata API failed, falling back to webpage', song_id) + # # window.__NUXT__.data[0].songinfo + # self._search_nextjs_data() + # self._search_json( + # r']+id=[\'"]__NUXT__[\'"][^>]*>', webpage, 'next.js data', + # song_id, end_pattern='') + # song_name = self._html_search_regex( + # r''',\s*?name\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'song name', group='name') + + # singer_name = self._html_search_regex( + # r''',\s*?artist\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'artist', fatal=False, group='artist') + + # album_name = self._html_search_regex( + # r''',\s*?album\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'album', fatal=False, group='album') + + # release_date = self._html_search_regex( + # r''',\s*?releaseDate\s*?:\s*?(['"])(?P.*?)\1''', + # webpage, 'release date', fatal=False, group='releaseDate') + # duration = None + # track_num = None + # thumbnail = None + # else: + song_info = metadata['data'] + + song_name = song_info.get('name') + singer_name = song_info.get('artist') + album_name = song_info.get('album') + release_date = song_info.get('releaseDate') + duration = song_info.get('duration') + track_num = song_info.get('track') + thumbnail = song_info.get('pic') + + if release_date is not None: + release_date = release_date.replace('-', '') - publish_time = self._html_search_regex( - r'发行时间:(\d{4}-\d{2}-\d{2})', album_info_page, - 'publish time', fatal=False) - if publish_time: - publish_time = publish_time.replace('-', '') + formats = self._get_formats(song_id) return { + 'formats': formats, 'id': song_id, 'title': song_name, - 'creator': singer_name, - 'upload_date': publish_time, - 'description': lrc_content, - 'formats': formats, + 'thumbnail': thumbnail, + 'release_date': release_date, + 'subtitles': subtitles, + 'duration': duration, + 'track': song_name, + 'track_number': track_num, + 'artists': [singer_name], + 'album': album_name, + '__post_extractor': self.extract_comments(song_id), } ```

haven't tested whether several other IEs are working. I'll probably open a pr later