mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.84k stars 973 forks source link

[bug] [reddit] Some v.redd.it links on User Profiles (possibly others) fail to download, 'NoneType' is not iterable error. #3258

Closed Silent-Soldier closed 1 year ago

Silent-Soldier commented 1 year ago

I recently ran across this bug while parsing a subreddit, but I can only reliably recreate the issue with a NSFW video link on a users profile so far. Otherwise, the issue is intermittent/fails to occur, no idea why.

Verbose output:

>gallery-dl --verbose "https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/"
2022-11-19 05:29:01 [gallery-dl][debug] Version 1.24.0-dev
2022-11-19 05:29:01 [gallery-dl][debug] Python 3.11.0 - Windows-10-10.0.19045-SP0
2022-11-19 05:29:01 [gallery-dl][debug] requests 2.28.1 - urllib3 1.26.12
2022-11-19 05:29:01 [gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/'
2022-11-19 05:29:03 [cookies][debug] Extracting cookies from C:\Users\*****\*****\*****\Mozilla\Firefox\Profiles\*****\cookies.sqlite
2022-11-19 05:29:03 [reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/'
2022-11-19 05:29:03 [urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
2022-11-19 05:29:03 [urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/x8p3yf/.json?limit=0&raw_json=1 HTTP/1.1" 200 2307
2022-11-19 05:29:03 [reddit][debug] Using download archive '*****/gallery-dl/.archives/reddit.sqlite3'
2022-11-19 05:29:03 [postprocessor.metadata][debug] Using download archive '*****/gallery-dl/.archives/reddit-metadata.sqlite3'
2022-11-19 05:29:03 [postprocessor.ugoira][debug] using mkvmerge demuxer
2022-11-19 05:29:03 [reddit][debug] Active postprocessor modules: [ClassifyPP, MetadataPP, MtimePP, UgoiraPP]
2022-11-19 05:29:04 [downloader.ytdl][debug] [generic] ypr3fhcnzjm91: Downloading webpage
2022-11-19 05:29:05 [downloader.ytdl][debug] [redirect] Following redirect to https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
2022-11-19 05:29:05 [downloader.ytdl][debug] [generic] eufrat: Downloading webpage
2022-11-19 05:29:05 [downloader.ytdl][warning] [generic] Falling back on generic information extractor
2022-11-19 05:29:06 [downloader.ytdl][debug] [generic] eufrat: Extracting information
2022-11-19 05:29:06 [downloader.ytdl][error] ERROR: Unsupported URL: https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
2022-11-19 05:29:06 [reddit][error] An unexpected error occurred: TypeError - argument of type 'NoneType' is not iterable. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
2022-11-19 05:29:06 [reddit][debug]
Traceback (most recent call last):
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 84, in run
    self.dispatch(msg)
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 128, in dispatch
    self.handle_url(url, kwdict)
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 248, in handle_url
    if not self.download(url):
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\job.py", line 380, in download
    return downloader.download(url, self.pathfmt)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\*****\*****\Roaming\Python\Python311\site-packages\gallery_dl\downloader\ytdl.py", line 69, in download
    if "entries" in info_dict:
       ^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable
mikf commented 1 year ago

Without cookies I only get a non-fatal error:

[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/x8p3yf/.json?limit=0&raw_json=1 HTTP/1.1" 200 2360
[downloader.ytdl][debug] [generic] ypr3fhcnzjm91: Downloading webpage
[downloader.ytdl][debug] [redirect] Following redirect to https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
[downloader.ytdl][debug] [generic] eufrat: Downloading webpage
[downloader.ytdl][warning] [generic] Falling back on generic information extractor
[downloader.ytdl][debug] [generic] eufrat: Extracting information
[downloader.ytdl][error] ERROR: Unsupported URL: https://www.reddit.com/user/69beautifulporn69/comments/x8p3yf/eufrat/
[download][error] Failed to download ytdl:https://v.redd.it/ypr3fhcnzjm91
InterruptSpeed commented 1 year ago

looks like reddit extractor wants to hand off to yt-dlp because the JSON file has is_video=true but it's using the JSON url key/value "url" : "https://v.redd.it/ypr3fhcnzjm91" rather than the correct key/value "fallback_url" : "https://v.redd.it/ypr3fhcnzjm91/DASH_720.mp4?source=fallback" found within media->reddit_video elements.

A proposed fix would be to check for the existence of fallback_url when the domain is v.redd.it and use that value to hand off to yt-dlp. I can work on that if it makes sense?

InterruptSpeed commented 1 year ago

what is a more pythonic fix? a)

try:
  url = submission["media"]["reddit_video"]["fallback_url"]
except KeyError:
  pass

b)

if "media" in submission \
  and "reddit_video" in submission["media"] \
  and "fallback_url" in submission["media"]["reddit_video"]:
  url = submission["media"]["reddit_video"]["fallback_url"]

to be inserted in the RedditExtractor items() method right before the yield in the elif submission["is_video"]: block

how to test that the change doesn't break other scenarios? can submit pull request for fix if we are on the right track.

Silent-Soldier commented 1 year ago

I believe @InterruptSpeed may be partially correct on this. I've been experimenting with various solutions over the last few days, focusing mainly on cookies being the issue (due to verbose feedback from gallery-dl and yt-dlp independently). Removing cookies altogether, the same behavior exists when trying the URI with yt-dlp by itself.

The "fallback_url" appears to download correctly when passed to yt-dlp, though the audio is cut/nonexistent. I believe the URIs need to be redirected to https://v.redd.it/ypr3fhcnzjm91/DASHPlaylist.mpd (higher quality) or https://v.redd.it/ypr3fhcnzjm91/HLSPlaylist.m3u8 (lower quality)?