ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.27k stars 9.94k forks source link

Alternate Audoiomack URL scheme #29800

Closed abdullah-if closed 2 years ago

abdullah-if commented 3 years ago

Here is the current audiomack regex

https?://(?:www\.)?audiomack\.com/song/(?P<id>[\w/-]+)

But audiomack has also this kind of URL\ https://audiomack.com/\<uploader name>/song/\<song name>

Using this type of URL load generic extractor and ultimately fails. The pattern need to be updated

dirkf commented 3 years ago

Please suggest actual URL examples, both successful and failing with yt-dl, that you can play in your browser.

abdullah-if commented 3 years ago

@dirkf

Please suggest actual URL examples, both successful and failing with yt-dl, that you can play in your browser.

The successful URL is redirected to failing URL when opened in browser. Failing URL: https://audiomack.com/islamiclibrary/song/abdul-rahman-al-sudais-sura-1-al-fatiha\ Debug log:

$ youtube-dl https://audiomack.com/islamiclibrary/song/abdul-rahman-al-sudais-sura-1-al-fatiha -v  
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://audiomack.com/islamiclibrary/song/abdul-rahman-al-sudais-sura-1-al-fatiha', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.6 (CPython) - Linux-5.13.10-zen1-1-zen-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[generic] abdul-rahman-al-sudais-sura-1-al-fatiha: Requesting header
WARNING: Falling back on generic information extractor.
[generic] abdul-rahman-al-sudais-sura-1-al-fatiha: Downloading webpage
[generic] abdul-rahman-al-sudais-sura-1-al-fatiha: Extracting information
ERROR: Unsupported URL: https://audiomack.com/islamiclibrary/song/abdul-rahman-al-sudais-sura-1-al-fatiha
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.9/site-packages/youtube_dl/extractor/generic.py", line 3520, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: https://audiomack.com/islamiclibrary/song/abdul-rahman-al-sudais-sura-1-al-fatiha

Supported URL: https://audiomack.com/song/islamiclibrary/abdul-rahman-al-sudais-sura-1-al-fatiha\ Debug log:

$ youtube-dl https://audiomack.com/song/islamiclibrary/abdul-rahman-al-sudais-sura-1-al-fatiha -v 
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://audiomack.com/song/islamiclibrary/abdul-rahman-al-sudais-sura-1-al-fatiha', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.6 (CPython) - Linux-5.13.10-zen1-1-zen-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[audiomack] islamiclibrary/abdul-rahman-al-sudais-sura-1-al-fatiha: Downloading JSON metadata
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'https://music.audiomack.com/albums/islamiclibrary/complete-quran/abdul-rahman-al-sudais-sura-1-al-fatiha.mp3?Expires=1629124522&Signature=F1Rqc3AEFcuwkeleTy2mGGnNZDrKwrN2mPJD08rqgEkw2xycAaTIbsRSeqT4bbu-rUTID15oPnUaOLrgR8b4Y0nkAEMB6IIqGDlBYMzfAjy0JFGduxzqay49w8sQniqqL494D3gqLlVmZMd6dnSrZz6jN95klI5mrwnWxY3byX4_&Key-Pair-Id=APKAIKAIRXBA2H7FXITA'
[download] Destination: Abdul Rahman Al Sudais_Sura  1_ Al Fatiha-13599433.mp3
[download] 100% of 287.96KiB in 00:06
dirkf commented 3 years ago

So we can make the pattern in extractor/audiomack.com (line 18) _VALID_URL = 'https?://(?:www\.)?audiomack\.com/(?:song/)?(?P<id>[\w/-]+)' and remove any '/song/' component from the resulting ID (line 51) album_url_tag = self._match_id(url).replace('/song/', '/')

Then:

# youtube-dl -v -F 'https://audiomack.com/islamiclibrary/song/abdul-rahman-
al-sudais-sura-1-al-fatiha'
[debug] System config: [u'--restrict-filenames', u'--prefer-ffmpeg', u'-f', u'best[height<=?1080][fps<=?60]', u'-o', u'/media/drive1/Video/%(title)s.%(ext)s']
[debug] User config: [u'-f', u'(best/bestvideo+bestaudio)[height<=?1080][fps<=?60][tbr<=?1900]']
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://audiomack.com/islamiclibrary/song/abdul-rahman-al-sudais-sura-1-al-fatiha']
[debug] Encodings: locale ASCII, fs ASCII, out ASCII, pref ASCII
[debug] youtube-dl version 2021.06.06.1
[debug] Python version 2.7.1 (CPython) - Linux-2.6.18-7.1-7405b0-smp-with-libc0
[debug] exe versions: ffmpeg 4.1, ffprobe 4.1
[debug] Proxy map: {}
[audiomack] islamiclibrary/abdul-rahman-al-sudais-sura-1-al-fatiha: Downloading JSON metadata
[info] Available formats for 13599433:
format code  extension  resolution note
0            mp3        unknown    
#
abdullah-if commented 3 years ago

I found the problem extend to albums, too. New album URL scheme is https://audiomack.com/<uploader name>/album/<song name>.\ Failing URL: https://audiomack.com/islamiclibrary/album/complete-quran-part-1-suras-1-70-89\ Debug log:

youtube-dl https://audiomack.com/islamiclibrary/album/complete-quran-part-1-suras-1-70-89 -F -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://audiomack.com/islamiclibrary/album/complete-quran-part-1-suras-1-70-89', '-F', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.6 (CPython) - Linux-5.13.10-zen1-1-zen-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[generic] complete-quran-part-1-suras-1-70-89: Requesting header
WARNING: Falling back on generic information extractor.
[generic] complete-quran-part-1-suras-1-70-89: Downloading webpage
[generic] complete-quran-part-1-suras-1-70-89: Extracting information
ERROR: Unsupported URL: https://audiomack.com/islamiclibrary/album/complete-quran-part-1-suras-1-70-89
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.9/site-packages/youtube_dl/extractor/generic.py", line 3520, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: https://audiomack.com/islamiclibrary/album/complete-quran-part-1-suras-1-70-89

Successful URL: youtube-dl https://audiomack.com/album/islamiclibrary/complete-quran-part-1-suras-1-70-89\ Debug log:

youtube-dl https://audiomack.com/album/muslimummah/muslim-ummah-6 -F -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://audiomack.com/album/muslimummah/muslim-ummah-6', '-F', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.6 (CPython) - Linux-5.13.10-zen1-1-zen-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[audiomack:album] muslimummah/muslim-ummah-6: Querying song information (1)
...............................................................................................................................................................................
[audiomack:album] muslimummah/muslim-ummah-6: Querying song information (38)
[download] Downloading playlist: muslim ummah
[audiomack:album] playlist muslim ummah: Collected 37 video ids (downloading 37 of them)
[download] Downloading video 1 of 37
[info] Available formats for 6402958:
format code  extension  resolution note
0            mp3        unknown    
..................................................................................
[download] Downloading video 37 of 37
[info] Available formats for 6403033:
format code  extension  resolution note
0            mp3        unknown    
[download] Finished downloading playlist: muslim ummah
dirkf commented 3 years ago

Clearly a bit more sophistication is needed to avoid finding songs as albums or vice versa.

In the pattern in extractor/audiomack.com (line 18) the id must either follow 'song/...' or contain '.../song/...':

    _VALID_URL = 'https?://(?:www\.)?audiomack\.com/(?:song/|(?=.+/song/))(?P<id>[\w/-]+)'

and remove any '/song/' component from the resulting ID (line 51) as before:

        album_url_tag = self._match_id(url).replace('/song/', '/')

Similarly for the album extractor, at line 77, the id must either follow 'album/...' or contain '.../album/...':

    _VALID_URL = 'https?://(?:www\.)?audiomack\.com/(?:album/|(?=.+/album/))(?P<id>[\w/-]+)'

and remove any '/album/' component from the resulting ID (line 116):

        album_url_tag = self._match_id(url).replace('/album/', '/')`
abdullah-if commented 3 years ago

As I was going to open a PR, turns out python test/test_download.py TestDownload.test_Audiomack_1 is throwing errors. (Without any modification). Turns out the song no longer exists. Is it just me, or it is another true issue ? Here is the URL http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle

abdullah-if commented 3 years ago

So does python test/test_download.py TestDownload.test_AudiomackAlbum and python test/test_download.py TestDownload.test_AudiomackAlbum_1, trying to compare string with int.

$ python test/test_download.py TestDownload.test_AudiomackAlbum
python test/test_download.py TestDownload.test_AudiomackAlbum      
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (1)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (2)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (3)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (4)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (5)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (6)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (7)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (8)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (9)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (10)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (11)
[audiomack:album] flytunezcom/tha-tour-part-2-mixtape: Querying song information (12)
[download] Downloading playlist: Tha Tour: Part 2 (Official Mixtape)
[audiomack:album] playlist Tha Tour: Part 2 (Official Mixtape): Collected 11 video ids (downloading 11 of them)
[download] Downloading video 1 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812265.info.json
[download] Downloading video 2 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812262.info.json
[download] Downloading video 3 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812266.info.json
[download] Downloading video 4 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812261.info.json
[download] Downloading video 5 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812256.info.json
[download] Downloading video 6 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812263.info.json
[download] Downloading video 7 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812259.info.json
[download] Downloading video 8 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812257.info.json
[download] Downloading video 9 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812253.info.json
[download] Downloading video 10 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812254.info.json
[download] Downloading video 11 of 11
[info] Writing video description metadata as JSON to: test_AudiomackAlbum_812260.info.json
[download] Finished downloading playlist: Tha Tour: Part 2 (Official Mixtape)
F
======================================================================
FAIL: test_AudiomackAlbum (__main__.TestDownload):
----------------------------------------------------------------------
Traceback (most recent call last):
  File "#/youtube-dl/test/test_download.py", line 178, in test_template
    expect_info_dict(self, res_dict, test_case.get('info_dict', {}))
  File "#/youtube-dl/test/helper.py", line 190, in expect_info_dict
    expect_dict(self, got_dict, expected_dict)
  File "#/youtube-dl/test/helper.py", line 186, in expect_dict
    expect_value(self, got, expected, info_field)
  File "#/youtube-dl/test/helper.py", line 178, in expect_value
    self.assertEqual(
AssertionError: '812251' != 812251 : Invalid value for field id, expected '812251', got 812251

----------------------------------------------------------------------
Ran 1 test in 134.827s

FAILED (failures=1)
abdullah-if commented 3 years ago

Now what? Will I wait for these issues to resolve or make a PR with new regex ? @dirkf

dirkf commented 3 years ago

As I was going to open a PR, turns out python test/test_download.py TestDownload.test_Audiomack_1 is throwing errors. (Without any modification). Turns out the song no longer exists. Is it just me, or it is another true issue ? Here is the URL http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle

Same for me and the site says "This song cannot be found or has been removed." The failing test can be disabled with 'only_matching': True, instead of the info_dict block (which could be commented out with a note that the test song is no longer live). Ideally, a new URL would be found for this test case.

Apparently the playlist ID for an album can be retrieved as a number but should be a string, which can be fixed with this revised line 138:

                       result[resultkey] = compat_str(api_response[apikey])    

The project admins (when they're about) like a PR to address one specific set of changes, so the dead URL can be ignored for your PR. You could justify including the line 138 change, which matches what's done for 'video' IDs.

abdullah-if commented 3 years ago

Now I have to correct a whole other stuff. Turns out the album in test case has 11 songs but expects 15 URL

abdullah-if commented 3 years ago

Everything done except........................ , python test/test_download.py TestDownload.test_AudiomackAlbum_1 is throwing error, it is trying to get value for a key from a empty list. The empty list, followed by 0 are good ol' print for erring values.


$ python test/test_download.py TestDownload.test_AudiomackAlbum_1
[audiomack:album] fakeshoredrive/ppp-pistol-p-project: Querying song information (1)
[audiomack:album] fakeshoredrive/ppp-pistol-p-project: Querying song information (2)
[audiomack:album] fakeshoredrive/ppp-pistol-p-project: Querying song information (3)
[download] Downloading playlist: PPP (Pistol P Project)
[audiomack:album] playlist PPP (Pistol P Project): Collected 2 video ids (downloading 0 of them)
[download] Finished downloading playlist: PPP (Pistol P Project)
[]
0
E
======================================================================
ERROR: test_AudiomackAlbum_1 (__main__.TestDownload):
----------------------------------------------------------------------
Traceback (most recent call last):
  File "#/youtube-dl/test/test_download.py", line 210, in test_template
    tc_res_dict = res_dict['entries'][tc_num]
IndexError: list index out of range

----------------------------------------------------------------------
Ran 1 test in 34.541s

FAILED (errors=1)
dirkf commented 3 years ago

The test is trying to match the 9th playlist item but there are only 2. I formatted the JSON output below.

# youtube-dl -J 'http://www.audiomack.com/album/fakeshoredrive/ppp-pistol-p-project'
{
  'extractor': 'audiomack:album',
  '_type': 'playlist',
  'title': 'PPP (Pistol P Project)',
  'extractor_key': 'AudiomackAlbum',
  'webpage_url': 'http://www.audiomack.com/album/fakeshoredrive/ppp-pistol-p-project',
  'entries': [
    {
      'extractor': 'audiomack:album',
      'protocol': 'https',
      'playlist_index': 1,
      'playlist': 'PPP (Pistol P Project)',
      'title': 'PPP (Pistol P Project) - 8. Real (prod by SYK SENSE  )',
      'id': '837576',
      'playlist_id': '837572',
      'webpage_url_basename': 'ppp-pistol-p-project',
      'display_id': '837576',
      'format': '0 - unknown',
      'requested_subtitles': null,
      'playlist_uploader': null,
      'uploader': 'Lil Herb a.k.a. G Herbo',
      'format_id': '0',
      'playlist_title': 'PPP (Pistol P Project)',
      'url': 'https://music.audiomack.com/albums/fakeshoredrive/ppp-pistol-p-project/8.-real-prod-by-syk-sense-.mp3?Expires=1629185917&Signature=S~z7zK991mqIBqC~mmgkV447ZhZpLbyKFiUw9SjKfsu9Q1VTr5iZnrxepehQRH8sPBj2KmRbKnJYyeJnPcKp0wl3irbjsDvh-Zr~~1J0KqjHEtmGkwZdaBzvc1GSSrFwc1I1XE9ogRqLkz-ZeRrUfFNCQ9WsmIw4GrKh6bg4vY8_&Key-Pair-Id=APKAIKAIRXBA2H7FXITA',
      'extractor_key': 'AudiomackAlbum',
      'http_headers': {
        'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'Accept-Language': 'en-us,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.115 Safari/537.36'
      },
      'ext': 'mp3',
      'webpage_url': 'http://www.audiomack.com/album/fakeshoredrive/ppp-pistol-p-project',
      'playlist_uploader_id': null,
      'n_entries': 2
    },
    {
      'extractor': 'audiomack:album',
      'protocol': 'https',
      'playlist_index': 2,
      'playlist': 'PPP (Pistol P Project)',
      'title': 'PPP (Pistol P Project) - 10. 4 Minutes Of Hell Part 4 (prod by DY OF 808 MAFIA)',
      'id': '837580',
      'playlist_id': '837572',
      'webpage_url_basename': 'ppp-pistol-p-project',
      'display_id': '837580',
      'format': '0 - unknown',
      'requested_subtitles': null,
      'playlist_uploader': null,
      'uploader': 'Lil Herb a.k.a. G Herbo',
      'format_id': '0',
      'playlist_title': 'PPP (Pistol P Project)',
      'url': 'https://music.audiomack.com/albums/fakeshoredrive/ppp-pistol-p-project/10.-4-minutes-of-hell-part-4-prod-by-dy-of-808-mafia.mp3?Expires=1629185918&Signature=OT717WdFq0v4O6ZAxl5jN8Lim8QG-VM~AOqEDGj89EON53cKAt5g6yRCbh71briDgK8-dmdTsATEerUhE2wQr2oCSGsb2skfmM9B5bAWUWTXxbPyIEIh31oeM0~LhQOEiOm4XRihOzVqgZdyOpLFPSjkWyrV-vAJ1z0zu9mhP2w_&Key-Pair-Id=APKAIKAIRXBA2H7FXITA',
      'extractor_key': 'AudiomackAlbum',
      'http_headers': {
        'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'Accept-Language': 'en-us,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.115 Safari/537.36'
      },
      'ext': 'mp3',
      'webpage_url': 'http://www.audiomack.com/album/fakeshoredrive/ppp-pistol-p-project',
      'playlist_uploader_id': null,
      'n_entries': 2
    }
  ],
  'id': '837572',
  'webpage_url_basename': 'ppp-pistol-p-project'
}