mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.84k stars 974 forks source link

[Reddit] RedGifs Metadata Is Not Passed #4496

Closed cheese529 closed 1 year ago

cheese529 commented 1 year ago

Despite having the following inside my config for redgifs ="filename": { "'_reddit' in locals()": "{_reddit[title]} {_reddit[date]:%Y-%m-%d} {_reddit[id]}.{extension}", "not locals().get('title')": "{filename}.{extension}" the reddit metadata is not passed forward. All the files downloaded have just the redgifs filename.

Hrxn commented 1 year ago

Do you also have parent-metadata set accordingly in "reddit": { }? Because you definitely need it..

        "reddit":
        {
            "#": "only spawn child extractors for links to specific sites",
            "whitelist": ["imgur", "redgifs", "gfycat"],
            "#": "put files from child extractors into the reddit directory",
            "parent-directory": true,
            "#": "transfer metadata to any child extractor as '_reddit'",
            "parent-metadata": "_reddit"
        },
Hrxn commented 1 year ago

Pinging @cheese529

Did you test this again? Because it should be working as intended.

cheese529 commented 1 year ago

I do indeed have the parent metadata enabled. I will test this again after I come home from uni and see if it is working.

cheese529 commented 1 year ago

@Hrxn I can confirm this is still not working, in fact now it even refuses to straight up download content from redgifs, i see the link being passed but nothing downloaded. Here's an example link [NSFW] https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/

mikf commented 1 year ago
$ gallery-dl https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/
/tmp/_/redgifs/redgifs_klutzygrowingpuppy.mp4

$ gallery-dl -o parent-metadata=_reddit --filter "print(_reddit['title'])" https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/
I’m really dragging a wagon back here
Hrxn commented 1 year ago

Same, also works on my machine ™️

PS D:\> python.exe $current_gallery_dl_master -o base-directory="." --verbose 'https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/'
Debug  :  gallery-dl -> Version 1.26.0-dev
Debug  :  gallery-dl -> Python 3.11.5 - Windows-10-10.0.19045-SP0
Debug  :  gallery-dl -> requests 2.31.0 - urllib3 2.0.4
Debug  :  gallery-dl -> Configuration Files ['%USERPROFILE%\\gallery-dl.conf']
Debug  :  gallery-dl -> Starting DownloadJob for 'https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/'
Debug  :  reddit -> Using RedditSubmissionExtractor for 'https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/'
Debug  :  reddit -> Using custom API credentials (client-id pPax3*****************)
Info   :  reddit -> Refreshing private access token
Debug  :  urllib3.connectionpool -> Starting new HTTPS connection (1): www.reddit.com:443
Debug  :  urllib3.connectionpool -> https://www.reddit.com:443 "POST /api/v1/access_token HTTP/1.1" 200 775
Debug  :  reddit -> Sleeping 0.10 seconds (request)
Debug  :  urllib3.connectionpool -> Starting new HTTPS connection (1): oauth.reddit.com:443
Debug  :  urllib3.connectionpool -> https://oauth.reddit.com:443 "GET /comments/16lqs4b/.json?limit=0&raw_json=1 HTTP/1.1" 200 4426
Debug  :  reddit -> Using download archive 'E:\Home\Meta\gallery-dl\archive\gallery-dl.archive.reddit.db'
Debug  :  reddit -> Active postprocessor modules: [ClassifyPP]
Debug  :  redgifs -> Using RedgifsImageExtractor for 'https://v3.redgifs.com/watch/klutzygrowingpuppy'
Debug  :  cookies -> Extracting cookies from C:\Users\Hrxn\AppData\Local\Google\Chrome\User Data\Profile 4\Network\Cookies
Debug  :  cookies -> Found Local State file at 'C:\Users\Hrxn\AppData\Local\Google\Chrome\User Data\Local State'
Info   :  cookies -> Extracted 2847 cookies from Chrome
Debug  :  cookies -> Cookie version breakdown: {'v10': 2847, 'other': 0, 'unencrypted': 0}
Debug  :  urllib3.connectionpool -> Starting new HTTPS connection (1): api.redgifs.com:443
Debug  :  urllib3.connectionpool -> https://api.redgifs.com:443 "GET /v2/auth/temporary HTTP/1.1" 200 None
Debug  :  urllib3.connectionpool -> https://api.redgifs.com:443 "GET /v2/gifs/klutzygrowingpuppy HTTP/1.1" 200 None
Debug  :  redgifs -> Using download archive 'E:\Home\Meta\gallery-dl\archive\gallery-dl.archive.redgifs.db'
Debug  :  urllib3.connectionpool -> Starting new HTTPS connection (1): thumbs46.redgifs.com:443
Debug  :  urllib3.connectionpool -> https://thumbs46.redgifs.com:443 "GET /KlutzyGrowingPuppy.mp4?expires=1695170400&signature=v2:e50912752de870cc343c5bb33ac5405aed3ae744e95d65fec6a36d817f2218e9&for=2a00:6020:b314:8e00&hash=6163438793 HTTP/1.1" 200 6468716
.\Reddit\S\Pawg\Unsorted\+Clips\2023-09-18.I_m_really_dragging_a_wagon_back_here.lil-braids.Score=2815.Comments=15.16lqs4b.mp4
PS D:\>

Could you post a full --verbose log?

cheese529 commented 1 year ago

@Hrxn Did some messing around with my config and I think I figured it out. I had to add the "whitelist": ["redgifs"], option in order for it to download.(weird because it was not blacklisted). Without this option inside the config it would refuse to download.

Regarding the metadata not being passed, it is still a bug. I will post a verbose log in a few minutes along with a text file filled with links you can use to test. They are NSFW so please be cautious.

cheese529 commented 1 year ago

My Current Config: https://mega.nz/file/Ep8CgRzB#iQEzMeAd4RvMEBZaMZpjyk-nAQR04HC8sSPak7QFvP8 Link to Verbose Log: https://pastebin.pl/view/08e08569 Link of URLs to test: https://pastebin.pl/view/55bb2bb6

Please let me know if something is wrong with my config as well although I don't think so.

mikf commented 1 year ago

That also works ... (I removed the whitelist setting, by the way)

$ gallery-dl --config-ignore -c myconfig.json https://www.reddit.com/r/pawg/comments/16lqs4b/im_really_dragging_a_wagon_back_here/
/tmp/_/parent-test/reddit/pawg/redgifs/I’m really dragging a wagon back here - 2/tmp/_/parent-test/reddit/pawg/redgifs/I’m really dragging a wagon back here - 2023-09-18 16lqs4b.mp4

Maybe the settings from your second config file are somehow interfering?

[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'C:\\Users\\mnoor\\Videos\\reddit cofig\\config1.json']

https://mega.nz/... https://pastebin.pl/...

Why not put it on https://gist.github.com/ ?

cheese529 commented 1 year ago

Will there be some sort of substitute for the whitelist setting or would I have to use blacklist if I want to avoid downloading from certain sites? Good point about the settings from my second config, I'll test again with just 1 config. Also did you try any of the URLs I sent in the pastebin? None of them pass down metadata. And BTW Thank you for telling me about https://gist.github.com/. This is wonderful, I will be using this for everything now :)

mikf commented 1 year ago

Will there be some sort of substitute for the whitelist setting or would I have to use blacklist if I want to avoid downloading from certain sites?

Well, you said that you needed to add a whitelist to make it work in the first place, but for me it worked even without.

Also did you try any of the URLs I sent in the pastebin?

They all link to gfycat, so they don't work anymore now that the site no longer exists. At least for the moment (#4558).

This is not an issue with it not passing metadata.

cheese529 commented 1 year ago

They all link to gfycat, so they don't work anymore now that the site no longer exists. At least for the moment (#4558).

Alright I think that explains everything now, I'm sorry I should have clarified that I am still using this version https://github.com/mikf/gallery-dl/commit/28798594e8dd165909ffd6d44578d7a109aae2a0

So interesting enough some of those gfycat URLs still download, just with the native file naming configuration that gfycat and redgifs used (e.g., UnrulyScholarlyAmoeba). That is why I assumed this was an issue with it not passing metadata. I believe it might be possible to figure something out here like you mentioned in (#4558). I will look more into it.

Hrxn commented 1 year ago

Since #4558 is its own special case, I think this one here can be closed?

mikf commented 1 year ago

I am still using this version https://github.com/mikf/gallery-dl/commit/28798594e8dd165909ffd6d44578d7a109aae2a0

In this case, you might get this to work by adding "parent-metadata": true to your gfycat settings.

parent-metadata only works for direct descendants/children, and for gfycat links that are actually hosted on redgifs, it goes reddit -> gfycat -> redgifs and gfycat never passes its reddit metadata down to redgifs.

cheese529 commented 1 year ago

Alright @mikf so it seems that the metadata is now being passed down correctly now without using "parent-metadata": true in my gfycat settings, seems so whatever you did in #4558 solved it.

Unfortunately this has also caused another issue, all media in the comments is now ignored due to duplicate filenames. How could we solve this to also download all that media? Would it be possible to somehow use the default settings in my config for media linked in the comments/posts with multiple media links?

Verbose Log: https://gist.github.com/cheese529/3a28388d1ac0156bcf0a4f67c8b44276