mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[Lensdump] Reddit posts linking to Lensdump files are being processed with the directlinks extractor instead. #5293

Closed taskhawk closed 7 months ago

taskhawk commented 7 months ago

Reddit posts linking directly to media files in Lensdump through the subdomains https://*.l3n.co/ ignore the configuration for lensdump and are being processed with the directlink extractor instead.

For example, some of the newest posts in this subreddit use these links (NSFW): https://new.reddit.com/r/NewYorkNine/new/

https://a.l3n.co/i/K4cEg9.gif https://b.l3n.co/i/Kh72OA.jpeg https://c.l3n.co/i/KhKxwK.jpeg

Hrxn commented 7 months ago

Well, yes. It's technically correct.

The links you've posted are all direct links, and gallery-dl detects them as such.

It would be possible to move the job to another extractor based on URL pattern detection of a "direct link" URL handled by gallery-dl, as an additional feature, but the first question should be here: Is it actually worth it?

For example, what kind of metadata do the lensdump direct links provide here? Versus the metadata provided by the poster of said links on reddit?

mikf commented 7 months ago

Supporting direct links with a site-specific extractor has been done for other sites like Flickr, Imgur, Reddit, and probably more, so it might as well be implemented for Lensdump as well.

It's only one real line of code that needs to be updated:

diff --git a/gallery_dl/extractor/lensdump.py b/gallery_dl/extractor/lensdump.py
index d4ccf33b..8ca9d88e 100644
--- a/gallery_dl/extractor/lensdump.py
+++ b/gallery_dl/extractor/lensdump.py
@@ -104,7 +104,7 @@ class LensdumpImageExtractor(LensdumpBase, Extractor):
     filename_fmt = "{category}_{id}{title:?_//}.{extension}"
     directory_fmt = ("{category}",)
     archive_fmt = "{id}"
-    pattern = BASE_PATTERN + r"/i/(\w+)"
+    pattern = r"(?:https?://)?(?:lensdump\.com|\w\.l3n\.co)/i/(\w+)"
     example = "https://lensdump.com/i/ID"

     def __init__(self, match):
taskhawk commented 7 months ago

Is it actually worth it?

For example, what kind of metadata do the lensdump direct links provide here? Versus the metadata provided by the poster of said links on reddit?

In my case it is to insert an entry in the Lensdump archive already set up to avoid redownloading again.

It's only one real line of code that needs to be updated:

Oh cool, glad it's simple. Thanks.

Hrxn commented 7 months ago

Is it actually worth it? For example, what kind of metadata do the lensdump direct links provide here? Versus the metadata provided by the poster of said links on reddit?

In my case it is to insert an entry in the Lensdump archive already set up to avoid redownloading again.

The archive for the "directlink" extractor works just as well here. Just saying.

Supporting direct links with a site-specific extractor has been done for other sites like Flickr, Imgur, Reddit, and probably more, so it might as well be implemented for Lensdump as well.

True, but to be fair, a Reddit direct link is also hosted on reddit, so it's kind of a "first party" direct link, as opposed to the common direct link hosted on an external service. Although I will admit that this distinction is not really that important.

taskhawk commented 7 months ago

Found a few Reddit posts linking to Lensdump files with an older URL format ended up going through the directlink extractor, for example:

NSFW https://i.lensdump.com/i/kXr9kv.gif https://i1.lensdump.com/i/63MU57.gif https://i2.lensdump.com/i/6Hf3tD.gif https://i3.lensdump.com/i/kFdVr2.gif

When loading the URL in the browser it redirects to the media page where the files now use the new URL format, for example:

https://lensdump.com/i/kXr9kv ==> https://a.l3n.co/i/kXr9kv.gif

mikf commented 7 months ago

Found a few Reddit posts linking to Lensdump files with an older URL format

Fixed in https://github.com/mikf/gallery-dl/commit/ac4e29f70a9a6ef023639576d7f93c45acec9ec2