mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.4k stars 931 forks source link

[reddit] downloading images not working if url belongs to preview.redd.it #4470

Closed ghbook closed 1 year ago

ghbook commented 1 year ago

I was downloading some images from reddit and found out that if reddit media url belongs to preview.redd.it downloader simply skips it.

my config is just reddit client-id and token

here is an example: https://www.reddit.com/r/europe/comments/pm4531/the_name_of_your_country_in_estonian/

D:\>gallery-dl -v "https://www.reddit.com/r/europe/comments/pm4531/the_name_of_your_country_in_estonian/"
[gallery-dl][debug] Version 1.26.0-dev
[gallery-dl][debug] Python 3.11.4 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.29.0 - urllib3 1.26.16
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/r/europe/comments/pm4531/the_name_of_your_country_in_estonian/'
[reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/r/europe/comments/pm4531/the_name_of_your_country_in_estonian/'
[reddit][debug] Using custom API credentials (client-id 6gRuf*****************)
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/pm4531/.json?limit=0&raw_json=1 HTTP/1.1" 200 49650

I am not sure if this is due to recent changes or the problem exists even before.

external-preview.redd.it works fine:

D:\>gallery-dl -v "https://www.reddit.com/r/ABoringDystopia/comments/r5nifg/microapartment_in_koszalin_poland_25sqm_269_sq/"
[gallery-dl][debug] Version 1.26.0-dev
[gallery-dl][debug] Python 3.11.4 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.29.0 - urllib3 1.26.16
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/r/ABoringDystopia/comments/r5nifg/microapartment_in_koszalin_poland_25sqm_269_sq/'
[reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/r/ABoringDystopia/comments/r5nifg/microapartment_in_koszalin_poland_25sqm_269_sq/'
[reddit][debug] Using custom API credentials (client-id 6gRuf*****************)
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/r5nifg/.json?limit=0&raw_json=1 HTTP/1.1" 200 51930
[directlink][debug] Using DirectlinkExtractor for 'https://external-preview.redd.it/hGFb97Tb6Ay7bIlBzzbqzuEzHw5C3-RS_oiTuBXNBFU.jpg?width=547&auto=webp&s=b66c7eb2a57b66f21f3eea59ff5464e82b417a58'
# F:\\dled-gallery-dl\directlink\external-preview.redd.it__hGFb97Tb6Ay7bIlBzzbqzuEzHw5C3-RS_oiTuBXNBFU.jpg

you can see i.redd.it also works:

D:\>gallery-dl -v "https://www.reddit.com/r/meirl/comments/13aobm9/meirl/"
[gallery-dl][debug] Version 1.26.0-dev
[gallery-dl][debug] Python 3.11.4 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.29.0 - urllib3 1.26.16
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/r/meirl/comments/13aobm9/meirl/'
[reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/r/meirl/comments/13aobm9/meirl/'
[reddit][debug] Using custom API credentials (client-id 6gRuf*****************)
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/13aobm9/.json?limit=0&raw_json=1 HTTP/1.1" 200 44821
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): i.redd.it:443
[urllib3.connectionpool][debug] https://i.redd.it:443 "GET /lr1tpcsihgya1.jpg HTTP/1.1" 200 48732
* F:\\dled-gallery-dl\reddit\meirl\13aobm9 Meirl.jpg
mikf commented 1 year ago

This problem existed at least since blacklist/whitelist options where added and child extractors with the same category got ignored by default. (c78aa175)

Hrxn commented 1 year ago

So, since Sep 11, 2020...

Curious.. Not something I've ever noticed. Well, I've never been a really heavy reddit user (downloader), but still, can't say that I remember anything in that regard.

preview.redd.it is the wrong domain anyway for any "content" (i.e. image) post on reddit..

cheese529 commented 11 months ago

Will this only download the preview if the attached reddit media has been removed and no other hosting links(imgur, redgifs) have been found? Because we might run into an issue with duplicate filenames if the media is still available unless it's possible to add something like prev to the filename of the previewed media that is downloaded.

cheese529 commented 11 months ago

Just tested this and unfortunately I am right, it does try to download media twice which is a problem if the media has not been removed since the preview is downloaded as a very low resolution thumb majority of the time.

Maybe we can somehow get the files belonging to preview.redd.it named with a prev at the end or something to differentiate them from the original full resolution files and prevent gallery-dl skipping due to duplicate filenames.

For example when gallery-dl comes across a redgifs link on reddit it will download the lower resolution preview and then download the full resolution file hosted on redgifs as well. If I was not using "'_reddit' in locals()": I would have no idea this even happened and there would be no way for me to differentiate between the two files.

Let me know if you don't understand what i'm saying, posted my verbose log here as well. https://gist.github.com/cheese529/43c49205e7eede8926f8f8e19e7f3b73