ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.2k stars 9.93k forks source link

Error returned when trying on Patreon content #28786

Open phoenixdigital opened 3 years ago

phoenixdigital commented 3 years ago

Checklist

Verbose log

youtube-dl --write-sub -o "S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s" -a youtube.txt --verbose
[debug] System config: ['--prefer-free-formats']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--write-sub', '-o', 'S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s', '-a', 'youtube.txt', '--verbose']
[debug] Batch file urls: ['https://www.patreon.com/posts/tansjs-sunday-49973145']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.04.17
[debug] Python version 3.9.2 (CPython) - Linux-5.11.11-200.fc33.x86_64-x86_64-with-glibc2.32
[debug] exe versions: ffmpeg 4.3.2, ffprobe 4.3.2
[debug] Proxy map: {}
[Patreon] 49973145: Downloading JSON metadata
ERROR: An extractor error has occurred. (caused by KeyError('post_file')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/extractor/patreon.py", line 148, in _real_extract
    post_file = attributes['post_file']
KeyError: 'post_file'
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/extractor/patreon.py", line 148, in _real_extract
    post_file = attributes['post_file']
KeyError: 'post_file'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 806, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 827, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 547, in extract
    raise ExtractorError('An extractor error has occurred.', cause=e)
youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by KeyError('post_file')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

I was trying to download a video to keep locally for storage on my media server for personal use and got this error that asked me to post here.

youtube-dl-error.txt

and the relevant section of the HTML which I belive is related to the error is attached here. The whole file contains far too much pii to post publicly.

youtubedl-test.txt

phoenixdigital commented 3 years ago

I found the issue and preferred template after posting this so unfortunately didn't search for the problem first. There was a issue posted about Patreon but there wasn't enough info in the post to see if it was the same

https://github.com/ytdl-org/youtube-dl/issues/28671

whoops and another which is the same issue https://github.com/ytdl-org/youtube-dl/issues/26940

Hopefully my bug report shows a bit more detail about the issue at hand. Happy to do some testing or provide a more comprehensive HTML source if requested. I just won't post it publicly as it's full of PII.

phoenixdigital commented 3 years ago

After digging into it and the patreon.py code it looks like the module requires authentication now as this is the relevant section it errors on when not logged in and you can see it doesn't contain the "post_file" value which is visible when logged in.

 "post": {
  "data": {
   "attributes": {
    "change_visibility_at": null,
    "comment_count": 168,
    "created_at": "2021-04-13T08:30:22.000+00:00",
    "current_user_can_comment": false,
    "current_user_can_delete": false,
    "current_user_can_view": false,
    "deleted_at": null,
    "edit_url": "/posts/tansjs-sunday-49973145/edit",
    "edited_at": "2021-04-13T09:03:55.000+00:00",
    "has_ti_violation": false,
    "image": {
     "height": 1080,
     "url": "https://image.mux.com/eczqtdsUBnoy02rXLd8ldxfXmrSFCQWmGcKJiFq0202ruY/thumbnail.jpg?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik5CY3o3Sk5RcUNmdDdWcmo5MWhra2lEY3Vyc2xtRGNmSU1oSFUzallZMDI0In0.eyJzdWIiOiJlY3pxdGRzVUJub3kwMnJYTGQ4bGR4ZlhtclNGQ1FXbUdjS0ppRnEwMjAycnVZIiwiZXhwIjoxNjIxMzc3NDA3LCJhdWQiOiJ0IiwidGltZSI6MTEyNy40fQ.SBLBfFgRfW8l2oei9F53VzDuXoJDcgV4D_FZtoV7vtTbr_QABtuZyVUGxM0j2eONDA-tfaUqkM4jjUIKbYUaRFZ0rOQj-qm0LN2BtG-tHTVY56iziRWxRz1UewltYXzggUM9eXnuqPCMpMgBguQ5lcHkE7u4zgA5Aa_s3n47-FRQWJTr6o_LbeTJD-IytWh4pPuwnJeUxsFvEs5lLVVX9NlbUPp5xA9Lm94BbDdmFFyuVRDq4yPS5EDtcDlYNxPVNuceDymVC4l6Cfn_KTdvoZ6OqVJJwY2XK2aWb2k49impL8nULTz20ZWVWbHoTKCm2tbIH1qXgHuXOH6pm9Vnaw",
     "width": 1920
    },
    "is_automated_monthly_charge": false,
    "is_paid": false,
    "like_count": 157,
    "meta_image_url": "https://c7.patreon.com/https%3A%2F%2Fwww.patreon.com%2F%2Fpost-teaser-image%2F49973145/selector/%23post-teaser",
    "min_cents_pledged_to_view": 1,
    "patreon_url": "/posts/tansjs-sunday-49973145",
    "pledge_url": "/bePatron?patAmt=0.01\u0026c=1752462",
    "post_metadata": null,
    "post_type": "video_external_file",
    "published_at": "2021-04-13T09:03:55.000+00:00",
    "scheduled_for": null,
    "teaser_text": null,
    "title": "TANSJS: Sunday Magic (BETA TESTING)",
    "upgrade_url": "/join/tellemstevedave/checkout?rid=2706415",
    "url": "https://www.patreon.com/posts/tansjs-sunday-49973145",
    "was_posted_by_campaign_owner": true
   },
   "id": "49973145",
   "relationships": {
    "access_rules": {
     "data": [
      {
       "id": "365871",
       "type": "access-rule"
      }
     ]
    },
    "audio": {
     "data": null
    },
    "campaign": {
     "data": {
      "id": "1752462",
      "type": "campaign"
     },
     "links": {
      "related": "https://www.patreon.com/api/campaigns/1752462"
     }
    },
    "images": {
     "data": []
    },
    "poll": {
     "data": null
    },
    "ti_checks": {
     "data": []
    },
    "user": {
     "data": {
      "id": "11190956",
      "type": "user"
     },
     "links": {
      "related": "https://www.patreon.com/api/user/11190956"
     }
    },
    "user_defined_tags": {
     "data": [
      {
       "id": "user_defined;All New Sunday Jeff Show",
       "type": "post_tag"
      },
      {
       "id": "user_defined;BETA",
       "type": "post_tag"
      },
      {
       "id": "user_defined;video",
       "type": "post_tag"
      }
     ]
    }
   },
   "type": "post"
  },
phoenixdigital commented 3 years ago

I tried uncommenting the login section of code in patreon.py but it didn't work. It's likely just a placeholder so I'll have to leave the rest to someone who knows a bit more about making a login module. Digging into the login process the URL is

https://www.patreon.com/api/login

It accepts these POST fields

{"data":{"type":"user","attributes":{"email":"theusername","password":"thepassword"},"relationships":{}}}

There is a form of 2FA where they ask you to approve that PC for login via email. So not sure what "fingerprint" it uses to detect a unique PC. Possibly this cookie in the request header to that login page.

cookie: __cfduid=_______redacted_____; patreon_device_id=_______redacted_____; patreon_location_country_code=AU; __cf_bm=_______redacted_____; _ALGOLIA=anonymous-_______redacted_____; _fbp=_______redacted_____; patreon_locale_code=en-US; G_ENABLED_IDPS=google; group_id=_______redacted_____; _swb=_______redacted_____

So you'd probably want to allow the user to pass in a patreon_device_id so they don't get forced with the 2FA everytime or cache it somewhere???

The response seems to set a cookie which might be used to keep the user logged in????

set-cookie: session_id=_____redacted______; Domain=patreon.com; Expires=Mon, 17-May-2021 22:--:00 GMT; Max-Age=2592000; Secure; HttpOnly; Path=/; SameSite=Lax

That's about all I could find. Hope that helps.

fictionic commented 3 years ago

I bypassed the login code by supplying a cookies file, and I got a different error:

[Patreon] 54675251: Downloading JSON metadata
WARNING: "url" field is missing or empty - skipping format, there is an error in extractor
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 33, in <module>
    sys.exit(load_entry_point('youtube-dl==2021.6.6', 'console_scripts', 'youtube-dl')())
  File "/usr/lib/python3.9/site-packages/youtube_dl/__init__.py", line 475, in main
    _real_main(argv)
  File "/usr/lib/python3.9/site-packages/youtube_dl/__init__.py", line 465, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 2068, in download
    res = self.extract_info(
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 847, in __extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 881, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 1637, in process_video_result
    if formats[0] is not info_dict:
IndexError: list index out of range
dirkf commented 2 years ago

Two problems there, at least:

And then there's the problem of actually extracting valid link.

dirkf commented 2 years ago

Also, if a page fails with yt-dl, it would be useful to know whether it works with yt-dlp, as that has a newer version of the extractor (but using some core functions not yet back-ported).