mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.84k stars 889 forks source link

[patreon] KeyError: 'name' #5048

Open thatfuckingbird opened 6 months ago

thatfuckingbird commented 6 months ago

Getting the following error on Patreon post URLs now (post URL partially redacted but it seems to occur on all posts I tried anyway):

[urllib3.connectionpool][debug][2024-01-10 21:10:59] https://c10.patreonusercontent.com:443 "GET /4/patreon-media/p/post/REDACTED/4e6f6c096b23440ea6684166f6acb1f4/eyJhIjoxLCJwIjoxfQ%3D%3D/1.jpg?token-time=REDACTED&token-hash=NV-REDACTED%3D HTTP/1.1" 200 None
[urllib3.connectionpool][debug][2024-01-10 21:11:00] https://c10.patreonusercontent.com:443 "HEAD /4/patreon-media/p/post/REDACTED/a2e398653dea4fa8b2b09e1fbbf6db44/eyJ3IjoxNjAwfQ%3D%3D/1.jpg?token-time=REDACTED&token-hash=REDACTED%3D HTTP/1.1" 200 0
[patreon][debug][2024-01-10 21:11:00] skipping https://c10.patreonusercontent.com/4/patreon-media/p/post/REDACTED/a2e398653dea4fa8b2b09e1fbbf6db44/eyJ3IjoxNjAwfQ%3D%3D/1.jpg?token-time=REDACTED&token-hash=REDACTED%3D (a2e398653dea4fa8b2b09e1fbbf6db44 image_large)
[patreon][error][2024-01-10 21:11:00] An unexpected error occurred: KeyError - 'name'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[patreon][debug][2024-01-10 21:11:00]
Traceback (most recent call last):
File "/home/thatfuckingbird/.cache/pypoetry/virtualenvs/hydownloader-R3zNBAmB-py3.11/lib/python3.11/site-packages/gallery_dl/job.py", line 127, in run
for msg in extractor:
File "/home/thatfuckingbird/.cache/pypoetry/virtualenvs/hydownloader-R3zNBAmB-py3.11/lib/python3.11/site-packages/gallery_dl/extractor/patreon.py", line 47, in items
for kind, url, name in itertools.chain.from_iterable(
File "/home/thatfuckingbird/.cache/pypoetry/virtualenvs/hydownloader-R3zNBAmB-py3.11/lib/python3.11/site-packages/gallery_dl/extractor/patreon.py", line 48, in <genexpr>
g(post) for g in generators):
^^^^^^^
File "/home/thatfuckingbird/.cache/pypoetry/virtualenvs/hydownloader-R3zNBAmB-py3.11/lib/python3.11/site-packages/gallery_dl/extractor/patreon.py", line 63, in _postfile
return (("postfile", postfile["url"], postfile["name"]),)
~~~~~~~~^^^^^^^^
KeyError: 'name'

Interestingly the files are still downloaded despite the error.

mikf commented 6 months ago

Tests still pass, my goto free post downloads without error, and this one does not raise an error as well. I do seem to see Patreon changes later than others judging by previous issues, so could you post a --write-pages for any of my 2 examples if they fail for you?

thatfuckingbird commented 6 months ago

Your examples work fine for me too. I've found a public post that produces the error: https://www.patreon.com/posts/2024-reward-96147929

mikf commented 6 months ago

No error for me ...

edit: https://github.com/yt-dlp/yt-dlp/issues/8973 might be related

bashonly commented 6 months ago

the yt-dlp error only presents if the user is passing cookies/auth

thatfuckingbird commented 6 months ago

Tried the public post above without cookies. It works fine. If I pass cookies the error returns.

thatfuckingbird commented 6 months ago

Using --write-pages and diffing the 01_https_www.patreon.com_posts_2024-reward-96147929.txt files, the post_file data looks completely different if logged in vs. not (left is the logged in version). image

shinji257 commented 6 months ago

I can reproduce but it is on a private post and breaks the progress.

Here is a gist with verbose output: https://gist.github.com/shinji257/e17f41f53a8f68e07871260ac5cb656e

I went back and tried the 3 most recent posts from this author and all do the same thing so this looks like an issue only when authenticated.

mikf commented 6 months ago

Should be fixed with https://github.com/mikf/gallery-dl/commit/1c68b7df010913cb661f06224bbbf7b610c79590 Not sure how this is going to affect filename metadata for postfile files.

I also still can't reproduce this error, even with logged in cookies. Maybe because I'm not subscribed to that creator or really anyone.

shinji257 commented 6 months ago

I did notice that if you are on a free sub it doesn't reproduce. It only affects paid subs. Probably why you can't reproduce.

shinji257 commented 6 months ago

I tested the new build and it seems to be fine with filenames as far as I can tell. Example output:

shinj@Tinym P:\....\gallery-dl  .\gallery-dl.exe --cookies-from-browser brave/patreon.com https://www.patreon.com/ssh_in_ys
[cookies][info] Extracted 9 cookies from Brave
# .\gallery-dl\patreon\sh_in_ys\93511207_Patch's attraction_P1, P2_01.png
# .\gallery-dl\patreon\sh_in_ys\93511207_Patch's attraction_P1, P2_02.png
# .\gallery-dl\patreon\sh_in_ys\95165587_Happy New Year 2024!_01.png
# .\gallery-dl\patreon\sh_in_ys\94523345_TTR 2023 December Rewards_01.png
# .\gallery-dl\patreon\sh_in_ys\94523345_TTR 2023 December Rewards_02.png
# .\gallery-dl\patreon\sh_in_ys\94523345_TTR 2023 December Rewards_03.png
* .\gallery-dl\patreon\sh_in_ys\89867129_Zipping up_01.png
* .\gallery-dl\patreon\sh_in_ys\89867129_Zipping up_02.png
* .\gallery-dl\patreon\sh_in_ys\90649576_Vapereon Suit_02_01.png
* .\gallery-dl\patreon\sh_in_ys\90649576_Vapereon Suit_02_02.png
[patreon][warning] Not allowed to view post 94006700
* .\gallery-dl\patreon\sh_in_ys\90649516_Vapereon suit_01_01.png
* .\gallery-dl\patreon\sh_in_ys\90649516_Vapereon suit_01_02.png
* .\gallery-dl\patreon\sh_in_ys\92996750_TTR 2023 November Rewards_01.png
[patreon][warning] Not allowed to view post 89455610
* .\gallery-dl\patreon\sh_in_ys\89404181_PomPom form_01.png
* .\gallery-dl\patreon\sh_in_ys\89404181_PomPom form_02.png
* .\gallery-dl\patreon\sh_in_ys\89404181_PomPom form_03.png
* .\gallery-dl\patreon\sh_in_ys\89404181_PomPom form_04.png
* .\gallery-dl\patreon\sh_in_ys\90191884_Null face drone, Noivern spawning_01.png
* .\gallery-dl\patreon\sh_in_ys\90191884_Null face drone, Noivern spawning_02.png
thatfuckingbird commented 6 months ago

Should be fixed with 1c68b7d Not sure how this is going to affect filename metadata for postfile files.

I also still can't reproduce this error, even with logged in cookies. Maybe because I'm not subscribed to that creator or really anyone.

I noticed there is another structure inside the bootstrap json that contains the file metadata which seems to be the same between free and paid tiers (i.e. if you use --write-pages to get the free post I linked without any cookies, and search that filename starting with S__ it will occur in one more place) - however that structure looks more complicated to me so might not be easy to switch to it.

Going to try the current fix in the meantime.

Edit: seems to be working fine. Even the filename metadata field is OK so I guess it gets it from somewhere else (?).

mikf commented 6 months ago

There seem to be cases where the metadata in this other structure does not include the post_file when it is a video, but only its preview image.

In the yt-dlp example, the https://stream.mux.com URL and its useless video filename are only available as post_file. The only "type": "media" data in included is just its thumbnail with vlcsnap-2023-12-15-14h25m32s661.png as name.

What I'm saying is that it's sadly not as easy to use as simply using the metadata from the "type": "media" file instead of from post_file.

Edit: seems to be working fine. Even the filename metadata field is OK so I guess it gets it from somewhere else (?).

It does an extra HEAD request in self._filename(url) and uses the Content-Disposition value.