Closed Hrxn closed 1 year ago
Please note, contrary to the URL itself which indicates a direct link to a JPG file, I end up redirected in the browser. (This has been a redesign of their site, not too long ago. Direct URLs like this used to work)
It depends on the Accept
header sent with the request. A browser sends text/html
along with a lot of other values, making Tumblr redirect it to an HTML page.
With */*
(curl, gdl) you either get the image if the hash/token is correct (c76d6df4266c74173985c757304a2a9bf214859b
) or a 404 error if not (cee6f8e33799615ac0acc958e244a0d1fbc5ef0c
).
The HTML page is supposed to always have a URL with the correct token and is what gets used by gallery-dl to upgrade a low-res image URL to its original size.
1) gallery-dl takes the URL returned by the API
https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg
2) replaces s640x960
with some unreasonably large resolution
https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg
3) and goes to the redirect HTML page to get the updated token
https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s99999x99999/c76d6df4266c74173985c757304a2a9bf214859b.jpg
For some reason step 3) returned the same URL as step 2) in this case. which is obviously not supposed to happen. I guess we could retry step 3) until it returns a different result than step 2).
So, I'm afraid, and a little bit at loss as how to reliable reproduce this
Yeah, that's kind of a problem. I let it go through the first 1k-2k posts of https://bespokeprovocateur2.tumblr.com
as well as adjusted the offset to have it immediately jump to post 686406675559841792
, but it downloaded everything without a problem.
Was this the only time such an error happened, or was it several times?
So it's probably not related to authentication?
It's not. The HTTP request in step 3) doesn't use any form of authentication, regardless of your OAuth settings.
I'm using {extractor.url} in the log settings, which always gives the default blog URL, i.e. the "input" URL as fed to gallery-dl, is it possible to directly log the offending post URL somehow? If it's not possible, maybe consider this a feature request..
Would be useful, but doing this with the current infrastructure would be painful since each extractor would have to manually update its current post URL value.
Was this the only time such an error happened, or was it several times?
Two attempts with https://bespokeprovocateur2.tumblr.com
, and it errored out on both at two different posts.
I'm using {extractor.url} in the log settings, which always gives the default blog URL, i.e. the "input" URL as fed to gallery-dl, is it possible to directly log the offending post URL somehow? If it's not possible, maybe consider this a feature request..
Would be useful, but doing this with the current infrastructure would be painful since each extractor would have to manually update its current post URL value.
Yeah, I realize that extractor.url
is working exactly as it should here, so not a good idea to change it in any way.
Just a rather uninformed guess, but maybe the easiest way to address this would be adding an additional check into tumblr.py
to test for that incorrect token (if that is immediately obvious?) and then call log.warning()
from there or something, as well as trying to repeat that last token step or some other kind of fallback mechanism, if necessary?
@mikf
Well, https://www.tumblr.com/docs/en/api/v2
is still the only official API doc I could find (and it's also here on GitHub, even has been updated just 19 days ago!) and it still mentions the same list of official Tumblr API client libs.
I just did a quick test with the Python one, and I ended up with the same issues I've been seeing in their web API console, e.g. always getting a 404 when trying to fetch a specific post.
A quick look at the issue tracker at tumblr/docs
seems to confirm as much, apparently their "official" clients are a bit out of date.
The package on PyPI has not been updated either in quite a while..
Funnily enough, there's a fork on GitHub (and PyPI) (https://github.com/nostalgebraist/pytumblr2
), I've just tried that one as well, and while it still has some issues ("key error", apparently it can't handle the response format from the API properly?), it actually seems to work, I could get specific post information, could like and unlike a post etc.
I think this confirms that the results from their API console don't really reflect the actual workings of their live API, and should probably not be relied on..
Still, if it's not related to authentication, as you said as well, I'm wondering what the culprit here might be. Just trying to rule out the possibilities. The config for Tumblr?
I'm also setting "ratelimit": "wait",
which has been a somewhat recent change, I think..
The problem is not API related, in that gallery-dl uses a non-API way to get higher-resolution images than the ones returned by the API. The API will most likely always return images only up to a certain size.
The culprit is the method used not being 100% reliable, or at least sometimes not updating the token at the end of an image URL.
What I would like to see is the --write-pages
output from an image redirect page that does not return the correct URL, if possible. Maybe it has both old/wrong and new/correct URL in it and gallery-dl just fails in extracting the right one. If that's not the case, I'm just going to implement a retry/fallback strategy.
It now retries fetching the higher-resolution version and prints a better warning when even that fails. It currently also downloads the lower-resolution version instead, which might not be the best idea ...
$ gallery-dl https://bespokeprovocateur2.tumblr.com/post/686406675559841792
[tumblr][warning] Unable to fetch higher-resolution version of https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg (686406675559841792)
/tmp/tumblr/bespokeprovocateur2/tumblr_…keprovocateur2_686406675559841792_01.jpg
Wait... did this lower-res fallback also happen to you in your initial test? Or is this unrelated to the latest changes now?
The lower-res "fallback" (it's not a real fallback that can be disabled) did not happen before the latest changes.
I've now implemented this as a proper fallback that can be disabled
and as long as original
is enabled it won't download the lower-res version. (https://github.com/mikf/gallery-dl/commit/f728b5ca062decb695e461ae95a469c08392c754)
The lower-res "fallback" (it's not a real fallback that can be disabled) did not happen before the latest changes.
If it was working (working better) for you before these changes I'd consider this a regression?
Okay, I'm testing this specific post again, first with the old revision and then with the latest commits.
PS D:\Temp> gallery-dl --verbose 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
Debug : gallery-dl -> Version 1.23.1
Debug : gallery-dl -> Python 3.10.7 - Windows-10-10.0.19043-SP0
Debug : gallery-dl -> requests 2.28.1 - urllib3 1.26.12
Debug : gallery-dl -> Starting DownloadJob for 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
Debug : tumblr -> Using custom OAuth1.0 authentication
Debug : tumblr -> Using TumblrPostExtractor for 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
Debug : urllib3.connectionpool -> Starting new HTTPS connection (1): api.tumblr.com:443
Debug : urllib3.connectionpool -> https://api.tumblr.com:443 "GET /v2/blog/bespokeprovocateur2.tumblr.com/posts?id=686406675559841792&offset=0&limit=50&reblog_info=true HTTP/1.1" 200 None
Debug : urllib3.connectionpool -> Starting new HTTPS connection (1): 64.media.tumblr.com:443
Debug : urllib3.connectionpool -> https://64.media.tumblr.com:443 "GET /1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg HTTP/1.1" 200 None
Debug : tumblr -> Using download archive 'E:\Transfer\INPUT\GLDL\archives\gldl-archive-tumblr.db'
Debug : tumblr -> Active postprocessor modules: [ClassifyPP]
Debug : urllib3.connectionpool -> https://64.media.tumblr.com:443 "GET /1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s99999x99999/c76d6df4266c74173985c757304a2a9bf214859b.jpg HTTP/1.1" 200 562786
* .\Tumblrverse\+Posts\Pictures\bespokeprovocateur2_686406675559841792_01.jpg
PS D:\Temp>
So, the second GET request here is for [..]/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg
, the response is 200 None
, the next GET request is for [..]/s99999x99999/c76d6df4266c74173985c757304a2a9bf214859b.jpg
and the response is 200 562786
, i.e. the actual download.
This means it's getting the correct token for the low-res upgrade by gallery-dl in this case, right?
In any way, the created JPG file is:
Size: 562786 bytes
SHA256 Hash: f0ef99acb325626b42df9105260bc7777af9286ac79608d7547d43233b16f781
gallery-dl --write-metadata --write-info-json 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
TX/30M/🔓
", "followed": true, "is_blocked_from_primary": false, "is_nsfw": false, "name": "bespokeprovocateur2", "posts": 22118, "share_likes": false, "submission_page_title": "Submit", "submission_terms": { "accepted_types": [ "text", "photo", "quote", "link", "video" ], "guidelines": "", "tags": [ "submission" ], "title": "Submit" }, "subscribed": false, "theme": { "avatar_shape": "square", "background_color": "#000000", "body_font": "Helvetica Neue", "header_bounds": "", "header_full_height": 1055, "header_full_width": 3000, "header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png", "header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_image_poster": "", "header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_stretch": true, "link_color": "#b6b4b4", "show_avatar": true, "show_description": true, "show_header_image": false, "show_title": true, "title_color": "#ffffff", "title_font": "Bodoni Recut FS", "title_font_weight": "regular" }, "title": "Bespoke Provocateur", "total_posts": 22118, "updated": 1664387332, "url": "https://bespokeprovocateur2.tumblr.com/", "uuid": "bespokeprovocateur2.tumblr.com" }, "blog_name": "bespokeprovocateur2", "body": "", "can_like": true, "can_reblog": true, "can_reply": true, "can_send_in_message": true, "category": "tumblr", "ckey": "", "count": 1, "date": "2022-06-07 13:26:57", "display_avatar": true, "extension": "jpg", "filename": "c76d6df4266c74173985c757304a2a9bf214859b", "followed": true, "format": "html", "hash": "c76d6df4266c74173985c757304a2a9bf214859b", "id": 686406675559841792, "id_string": "686406675559841792", "interactability_reblog": "everyone", "liked": false, "mkey": "", "note_count": 241, "num": 1, "post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792", "reblog": { "comment": "", "tree_html": "" }, "reblog_key": "VzmnKPLR", "reblogged": true, "reblogged_from_can_message": true, "reblogged_from_following": false, "reblogged_from_id": "686153284087627776", "reblogged_from_name": "honeyandrosewater", "reblogged_from_title": "Honey & Rose Water", "reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776", "reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA", "reblogged_root_can_message": true, "reblogged_root_following": false, "reblogged_root_id": "685723365852545024", "reblogged_root_name": "risiblesvmours", "reblogged_root_title": "Martinelli", "reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024", "reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ", "recommended_color": null, "recommended_source": null, "short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00", "should_open_in_legacy": false, "skey": "", "slug": "", "state": "published", "subcategory": "post", "summary": "", "tags": [], "timestamp": 1654608417, "title": "", "tkey": "", "type": "text" } ```TX/30M/🔓
", "followed": true, "is_blocked_from_primary": false, "is_nsfw": false, "name": "bespokeprovocateur2", "posts": 22118, "share_likes": false, "submission_page_title": "Submit", "submission_terms": { "accepted_types": [ "text", "photo", "quote", "link", "video" ], "guidelines": "", "tags": [ "submission" ], "title": "Submit" }, "subscribed": false, "theme": { "avatar_shape": "square", "background_color": "#000000", "body_font": "Helvetica Neue", "header_bounds": "", "header_full_height": 1055, "header_full_width": 3000, "header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png", "header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_image_poster": "", "header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_stretch": true, "link_color": "#b6b4b4", "show_avatar": true, "show_description": true, "show_header_image": false, "show_title": true, "title_color": "#ffffff", "title_font": "Bodoni Recut FS", "title_font_weight": "regular" }, "title": "Bespoke Provocateur", "total_posts": 22118, "updated": 1664387332, "url": "https://bespokeprovocateur2.tumblr.com/", "uuid": "bespokeprovocateur2.tumblr.com" }, "blog_name": "bespokeprovocateur2", "body": "", "can_like": true, "can_reblog": true, "can_reply": true, "can_send_in_message": true, "category": "tumblr", "ckey": "", "count": 1, "date": "2022-06-07 13:26:57", "display_avatar": true, "followed": true, "format": "html", "id": 686406675559841792, "id_string": "686406675559841792", "interactability_reblog": "everyone", "liked": false, "mkey": "", "note_count": 241, "post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792", "reblog": { "comment": "", "tree_html": "" }, "reblog_key": "VzmnKPLR", "reblogged": true, "reblogged_from_can_message": true, "reblogged_from_following": false, "reblogged_from_id": "686153284087627776", "reblogged_from_name": "honeyandrosewater", "reblogged_from_title": "Honey & Rose Water", "reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776", "reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA", "reblogged_root_can_message": true, "reblogged_root_following": false, "reblogged_root_id": "685723365852545024", "reblogged_root_name": "risiblesvmours", "reblogged_root_title": "Martinelli", "reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024", "reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ", "recommended_color": null, "recommended_source": null, "short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00", "should_open_in_legacy": false, "skey": "", "slug": "", "state": "published", "subcategory": "post", "summary": "", "tags": [], "timestamp": 1654608417, "title": "", "tkey": "", "type": "text" } ```Btw, the only difference seems to be that the first file has these four lines that do not exist in the second file
"extension": "jpg",
"filename": "c76d6df4266c74173985c757304a2a9bf214859b",
[..]
"hash": "c76d6df4266c74173985c757304a2a9bf214859b",
[..]
"num": 1,
is this normal?
gallery-dl --write-pages 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
gallery-dl --dump-json 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792' > jsondump.txt
PS D:\Temp> gallery-dl --verbose 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
Debug : gallery-dl -> Version 1.23.2-dev
Debug : gallery-dl -> Python 3.10.7 - Windows-10-10.0.19043-SP0
Debug : gallery-dl -> requests 2.28.1 - urllib3 1.26.12
Debug : gallery-dl -> Starting DownloadJob for 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
Debug : tumblr -> Using custom OAuth1.0 authentication
Debug : tumblr -> Using TumblrPostExtractor for 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
Debug : urllib3.connectionpool -> Starting new HTTPS connection (1): api.tumblr.com:443
Debug : urllib3.connectionpool -> https://api.tumblr.com:443 "GET /v2/blog/bespokeprovocateur2.tumblr.com/posts?id=686406675559841792&offset=0&limit=50&reblog_info=true HTTP/1.1" 200 None
Debug : urllib3.connectionpool -> Starting new HTTPS connection (1): 64.media.tumblr.com:443
Debug : urllib3.connectionpool -> https://64.media.tumblr.com:443 "GET /1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg HTTP/1.1" 200 None
Debug : tumblr -> Using download archive 'E:\Transfer\INPUT\GLDL\archives\gldl-archive-tumblr.db'
Debug : tumblr -> Active postprocessor modules: [ClassifyPP]
Debug : urllib3.connectionpool -> https://64.media.tumblr.com:443 "GET /1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s99999x99999/c76d6df4266c74173985c757304a2a9bf214859b.jpg HTTP/1.1" 200 562786
* .\Tumblrverse\+Posts\Pictures\bespokeprovocateur2_686406675559841792_01.jpg
PS D:\Temp>
In any way, the created JPG file is:
Size: 562786 bytes
SHA256 Hash: f0ef99acb325626b42df9105260bc7777af9286ac79608d7547d43233b16f781
(phew, at least something is working 😄 )
gallery-dl --write-metadata --write-info-json 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
TX/30M/🔓
", "followed": true, "is_blocked_from_primary": false, "is_nsfw": false, "name": "bespokeprovocateur2", "posts": 22118, "share_likes": false, "submission_page_title": "Submit", "submission_terms": { "accepted_types": [ "text", "photo", "quote", "link", "video" ], "guidelines": "", "tags": [ "submission" ], "title": "Submit" }, "subscribed": false, "theme": { "avatar_shape": "square", "background_color": "#000000", "body_font": "Helvetica Neue", "header_bounds": "", "header_full_height": 1055, "header_full_width": 3000, "header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png", "header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_image_poster": "", "header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_stretch": true, "link_color": "#b6b4b4", "show_avatar": true, "show_description": true, "show_header_image": false, "show_title": true, "title_color": "#ffffff", "title_font": "Bodoni Recut FS", "title_font_weight": "regular" }, "title": "Bespoke Provocateur", "total_posts": 22118, "updated": 1664387332, "url": "https://bespokeprovocateur2.tumblr.com/", "uuid": "bespokeprovocateur2.tumblr.com" }, "blog_name": "bespokeprovocateur2", "body": "", "can_like": true, "can_reblog": true, "can_reply": true, "can_send_in_message": true, "category": "tumblr", "ckey": "", "count": 1, "date": "2022-06-07 13:26:57", "display_avatar": true, "extension": "jpg", "filename": "c76d6df4266c74173985c757304a2a9bf214859b", "followed": true, "format": "html", "hash": "c76d6df4266c74173985c757304a2a9bf214859b", "id": 686406675559841792, "id_string": "686406675559841792", "interactability_reblog": "everyone", "liked": false, "mkey": "", "note_count": 241, "num": 1, "post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792", "reblog": { "comment": "", "tree_html": "" }, "reblog_key": "VzmnKPLR", "reblogged": true, "reblogged_from_can_message": true, "reblogged_from_following": false, "reblogged_from_id": "686153284087627776", "reblogged_from_name": "honeyandrosewater", "reblogged_from_title": "Honey & Rose Water", "reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776", "reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA", "reblogged_root_can_message": true, "reblogged_root_following": false, "reblogged_root_id": "685723365852545024", "reblogged_root_name": "risiblesvmours", "reblogged_root_title": "Martinelli", "reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024", "reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ", "recommended_color": null, "recommended_source": null, "short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00", "should_open_in_legacy": false, "skey": "", "slug": "", "state": "published", "subcategory": "post", "summary": "", "tags": [], "timestamp": 1654608417, "title": "", "tkey": "", "type": "text" } ```TX/30M/🔓
", "followed": true, "is_blocked_from_primary": false, "is_nsfw": false, "name": "bespokeprovocateur2", "posts": 22118, "share_likes": false, "submission_page_title": "Submit", "submission_terms": { "accepted_types": [ "text", "photo", "quote", "link", "video" ], "guidelines": "", "tags": [ "submission" ], "title": "Submit" }, "subscribed": false, "theme": { "avatar_shape": "square", "background_color": "#000000", "body_font": "Helvetica Neue", "header_bounds": "", "header_full_height": 1055, "header_full_width": 3000, "header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png", "header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_image_poster": "", "header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png", "header_stretch": true, "link_color": "#b6b4b4", "show_avatar": true, "show_description": true, "show_header_image": false, "show_title": true, "title_color": "#ffffff", "title_font": "Bodoni Recut FS", "title_font_weight": "regular" }, "title": "Bespoke Provocateur", "total_posts": 22118, "updated": 1664387332, "url": "https://bespokeprovocateur2.tumblr.com/", "uuid": "bespokeprovocateur2.tumblr.com" }, "blog_name": "bespokeprovocateur2", "body": "", "can_like": true, "can_reblog": true, "can_reply": true, "can_send_in_message": true, "category": "tumblr", "ckey": "", "count": 1, "date": "2022-06-07 13:26:57", "display_avatar": true, "followed": true, "format": "html", "id": 686406675559841792, "id_string": "686406675559841792", "interactability_reblog": "everyone", "liked": false, "mkey": "", "note_count": 241, "post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792", "reblog": { "comment": "", "tree_html": "" }, "reblog_key": "VzmnKPLR", "reblogged": true, "reblogged_from_can_message": true, "reblogged_from_following": false, "reblogged_from_id": "686153284087627776", "reblogged_from_name": "honeyandrosewater", "reblogged_from_title": "Honey & Rose Water", "reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776", "reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA", "reblogged_root_can_message": true, "reblogged_root_following": false, "reblogged_root_id": "685723365852545024", "reblogged_root_name": "risiblesvmours", "reblogged_root_title": "Martinelli", "reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024", "reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ", "recommended_color": null, "recommended_source": null, "short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00", "should_open_in_legacy": false, "skey": "", "slug": "", "state": "published", "subcategory": "post", "summary": "", "tags": [], "timestamp": 1654608417, "title": "", "tkey": "", "type": "text" } ```gallery-dl --write-pages 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792'
gallery-dl --dump-json 'https://bespokeprovocateur2.tumblr.com/post/686406675559841792' > jsondump.txt
PS D:\Temp> Get-FileHash .\BEFORE\01*,.\LATEST\01*
Algorithm Hash Path
--------- ---- ----
SHA256 F7836FE1F62BE15E586B88FB66034BAE7EE76ECA94464D9DCAE43C4F8A01743A D:\Temp\BEFORE\01_https_api.tumblr.com_v2_blog_bespokeprovocateur2.tumb…
SHA256 F7836FE1F62BE15E586B88FB66034BAE7EE76ECA94464D9DCAE43C4F8A01743A D:\Temp\LATEST\01_https_api.tumblr.com_v2_blog_bespokeprovocateur2.tumb…
PS D:\Temp>
The 02_https_64.media.tumblr.com_1d78c[...]
files are almost identical, the only differences are nonce
attributes in script tags like this:
<script type="text/javascript" nonce="YjRmYTMxNmI4YjMxNWU3YjE4YzIwZWQ2NDJmYzE1ODk=">
<script type="text/javascript" nonce="ZjMwZDBlNmI3NDdhZDg5OTgzYmE2NjM1MThhNDM5ODc=">
The JPG files are identical as well
PS D:\Temp> Get-FileHash .\BEFORE\*.jpg,.\LATEST\*.jpg
Algorithm Hash Path
--------- ---- ----
SHA256 F0EF99ACB325626B42DF9105260BC7777AF9286AC79608D7547D43233B16F781 D:\Temp\BEFORE\bespokeprovocateur2_686406675559841792_01.jpg
SHA256 F0EF99ACB325626B42DF9105260BC7777AF9286AC79608D7547D43233B16F781 D:\Temp\LATEST\bespokeprovocateur2_686406675559841792_01.jpg
PS D:\Temp>
The rest is also the same..
PS D:\Temp> Get-FileHash .\BEFORE\*.json,.\BEFORE\*.txt,.\LATEST\*.json,.\LATEST\*.txt | sort Hash
Algorithm Hash Path
--------- ---- ----
SHA256 0FED27B6BC3C55D296BC2AF554753C072DE9806C152C45A64623C04500CC42A4 D:\Temp\BEFORE\bespokeprovocateur2_686406675559841792_01.jpg.json
SHA256 0FED27B6BC3C55D296BC2AF554753C072DE9806C152C45A64623C04500CC42A4 D:\Temp\LATEST\bespokeprovocateur2_686406675559841792_01.jpg.json
SHA256 1F5096310C989A823C7B57B526B9D826A334A9DD0053788E2F89CA455425D3C9 D:\Temp\BEFORE\jsondump.txt
SHA256 1F5096310C989A823C7B57B526B9D826A334A9DD0053788E2F89CA455425D3C9 D:\Temp\LATEST\jsondump.txt
SHA256 9909C930A33EC86A28F494ABB5A16E4B8748A04FCC10FC25257956E87656277C D:\Temp\BEFORE\info.json
SHA256 9909C930A33EC86A28F494ABB5A16E4B8748A04FCC10FC25257956E87656277C D:\Temp\LATEST\info.json
PS D:\Temp>
I reckon the TumblrPostExtractor
is working fine..
I'll start a full blog extraction of this thing right now..
Nope... that was pretty quick this time.
After just 22 downloaded files:
Error : tumblr -> HttpError: '404 Not Found' for 'https://64.media.tumblr.com/3f0527fd633a6b093a39356cf831ea7d/843ad87bc3e83b24-5b/s99999x99999/96237fba40be11d681e89cf5712e62951a0367d0.jpg' [https://bespokeprovocateur2.tumblr.com/]
Same thing in the logfile: [2022-09-29T19:35:55][error] HttpError: '404 Not Found' for 'https://64.media.tumblr.com/3f0527fd633a6b093a39356cf831ea7d/843ad87bc3e83b24-5b/s99999x99999/96237fba40be11d681e89cf5712e62951a0367d0.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
I'll bet that using the post extractor would work here, again. Why does this work, while stopping extraction at this thing when using the blog extractor?
Thanks for the thorough bug report.
I gave it another shot with https://github.com/mikf/gallery-dl/commit/e1d714943b54adab96968e13192700f2118aeee6 by catching any 404 Not Found errors and relying on the fallback to sort it out.
I've also finally managed to trigger this error myself after only 3 file downloads, but from then on never again ... and of course I didn't use --write-pages
at the time. I'd really like to see what Tumblr sends in this situation.
Oh, this is getting interesting now.. 😄
I'll update to https://github.com/mikf/gallery-dl/commit/e1d714943b54adab96968e13192700f2118aeee6 now and will start a new run later today...
By the way, I noticed this while scrolling through tumblr.py
, just below the changes from this latest commit:
What's the point of these three yield statements?
One more thing, just to rule out possible causes step-by-step, I've also tested a full blog extraction run with the standalone executable from here:
https://github.com/mikf/gallery-dl/actions/runs/3149801258
This one, to be exact:
gallery-dl-windows-latest-x64-3.10
Result: Shows exactly the same issue as my system's python interpreter with the python package. (Config was still the same, my normal config file though.. )
So the changes from https://github.com/mikf/gallery-dl/commit/e1d714943b54adab96968e13192700f2118aeee6 still result in HttpErrors coming from the extractor side? It really shouldn't ...
What's the point of these three yield statements?
Three extra/fallback attempts to grab the correct URL.
yield
statements do not execute all at once, but only one by one when needed.
I've also tested a full blog extraction run with the standalone executable from here:
https://github.com/mikf/gallery-dl/actions/runs/3149801258
But that one is from before https://github.com/mikf/gallery-dl/commit/e1d714943b54adab96968e13192700f2118aeee6 ...
I've also tested a full blog extraction run with the standalone executable from here:
https://github.com/mikf/gallery-dl/actions/runs/3149801258
But that one is from before e1d7149 ...
True. I was aware, this was simply a test on my end, to confirm that there would be no differences between my system python interpreter and the bundled standalone executable. And, as shown, both indeed had the same behaviour for me..
So the changes from e1d7149 still result in HttpErrors coming from the extractor side? It really shouldn't ...
No, it's making progress now. https://github.com/mikf/gallery-dl/blob/e1d714943b54adab96968e13192700f2118aeee6/gallery_dl/extractor/tumblr.py#L244-L250
I think the exception is now handled here, and I could download the whole blog thing (23.6 GiB - thanks, hugely inefficient GIF format!) but it seems like it's still getting that unexpected token thing...
But I noticed something, many times these errors here did not happen randomly and in isolation, it seems like they were bunched together, i.e. many of them happening in direct succession for a dozen times or so.
If this is indeed not related to any authentication, and not some wrong result from the API, I'm inclined to believe that this could be caused by the response from (in this case) 64.media.tumblr.com
, maybe they are sometimes faulty?
Maybe there is some sort of invisible rate limit? Would explain why these errors come bundled.
In any case, retrying these requests 3 times doesn't solve anything it seems.
Yup.. but maybe adding a forced delay between these 3 request repetitions would help?
I've now added a 2 minute wait time between each fallback. Maybe that helps. (https://github.com/mikf/gallery-dl/commit/e5d229c5247c02e42737641229dc516029f5b790)
Okay, I've downloaded this whole thing again with the latest gallery-dl release (v1.23.3)
While the last run with the old version failed with 594 request attempts, it now logged just 35 failed attempts. I'd definitely call that an improvement!
Here, the first 30 lines from the logfile:
[2022-10-17T08:57:52][warning] '404 Not Found' for 'https://64.media.tumblr.com/ea03b7ca93caebe7c6e37df7eaefd75b/7723e6fce69ee3cf-11/s99999x99999/f644f82b0a7e5cd92e82dd74a7c5007ed68dbadc.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T08:59:52][info] Trying fallback URL #1
[2022-10-17T08:59:52][warning] '404 Not Found' for 'https://64.media.tumblr.com/ea03b7ca93caebe7c6e37df7eaefd75b/7723e6fce69ee3cf-11/s99999x99999/f644f82b0a7e5cd92e82dd74a7c5007ed68dbadc.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:01:52][info] Trying fallback URL #2
[2022-10-17T09:01:52][warning] '404 Not Found' for 'https://64.media.tumblr.com/ea03b7ca93caebe7c6e37df7eaefd75b/7723e6fce69ee3cf-11/s99999x99999/f644f82b0a7e5cd92e82dd74a7c5007ed68dbadc.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:03:53][info] Trying fallback URL #3
[2022-10-17T09:03:53][warning] '404 Not Found' for 'https://64.media.tumblr.com/0e19807adb1a5298d6769894db5aa921/1241c35afd3c2637-da/s99999x99999/e23201c33961bd77fbc25434bf64b9098732eb7f.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:05:54][info] Trying fallback URL #1
[2022-10-17T09:05:54][warning] '404 Not Found' for 'https://64.media.tumblr.com/0e19807adb1a5298d6769894db5aa921/1241c35afd3c2637-da/s99999x99999/e23201c33961bd77fbc25434bf64b9098732eb7f.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:07:54][info] Trying fallback URL #2
[2022-10-17T09:07:57][warning] '404 Not Found' for 'https://64.media.tumblr.com/233ddfb8d2b598d859c7433756d44233/d7a3b5d1502e206b-59/s99999x99999/93ac5653f9471b1dcdabdaf4e1682be66cb4c0ab.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:09:57][info] Trying fallback URL #1
[2022-10-17T09:10:59][warning] '404 Not Found' for 'https://64.media.tumblr.com/151edf2d063d226e0205dd635608903e/df7243413998c01e-32/s99999x99999/d824ef6cd7a2c7a7de601201fc7dea46f7b49d33.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:12:59][info] Trying fallback URL #1
[2022-10-17T09:54:40][warning] '404 Not Found' for 'https://64.media.tumblr.com/938fdb9a62701727aa469c38438eed88/898dbdeff0499f5d-db/s99999x99999/b6211dae8f677b2a334a7497c17c60656a0d9f3a.gif' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:56:40][info] Trying fallback URL #1
[2022-10-17T09:56:45][warning] '404 Not Found' for 'https://64.media.tumblr.com/fbd624637fea20b903933d49db5f93a8/90abd8c5e75a8b75-23/s99999x99999/c5cb2c3eca15d163cbebad9c0536a493bdc54738.png' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T09:58:45][info] Trying fallback URL #1
[2022-10-17T09:58:56][warning] '404 Not Found' for 'https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:00:56][info] Trying fallback URL #1
[2022-10-17T10:00:56][warning] '404 Not Found' for 'https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:02:57][info] Trying fallback URL #2
[2022-10-17T10:02:57][warning] '404 Not Found' for 'https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:04:57][info] Trying fallback URL #3
[2022-10-17T10:04:57][warning] '404 Not Found' for 'https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:04:57][warning] Unable to fetch higher-resolution version of https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg (686362190566080512) [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:04:57][error] Failed to download 686362190566080512_01.jpg [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:04:58][warning] '404 Not Found' for 'https://64.media.tumblr.com/3c317583ed7a6a9c73210c597e4dfd7c/268ee4cc47ebc885-e9/s99999x99999/b11b9f656d25a0143585df316d4d3b1747293ef2.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
[2022-10-17T10:06:59][info] Trying fallback URL #1
[2022-10-17T10:06:59][warning] '404 Not Found' for 'https://64.media.tumblr.com/3c317583ed7a6a9c73210c597e4dfd7c/268ee4cc47ebc885-e9/s99999x99999/b11b9f656d25a0143585df316d4d3b1747293ef2.jpg' [Source URL: https://bespokeprovocateur2.tumblr.com/]
You can see the two minute gaps in the timestamps. Also, you can see the "using fallback 1, 2, 3" messages, and then just "fallback 1, 2" and also a couple of "using fallback 1" before it proceeds, Definitely a difference to the log before, were it's basically always "try fallback 1, try fallback 2, try fallback 3, and then Error"..
I'll check the remaining error emitting URLs manually, to see if the image res upgrade is actually making a difference here..
Edit:
Maybe I should also add, for the sake of completeness, that this was done with the latest stable Python release, i.e.
[PS] > python
Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
While the old run was still on 3.10.7 Although this should probably not make any difference.
Okay, I've checked the remaining URLs from my log...
Ignoring duplicate entries, this is basically all that remained:
https://64.media.tumblr.com/ea03b7ca93caebe7c6e37df7eaefd75b/7723e6fce69ee3cf-11/s99999x99999/f644f82b0a7e5cd92e82dd74a7c5007ed68dbadc.jpg
https://64.media.tumblr.com/0e19807adb1a5298d6769894db5aa921/1241c35afd3c2637-da/s99999x99999/e23201c33961bd77fbc25434bf64b9098732eb7f.jpg
https://64.media.tumblr.com/233ddfb8d2b598d859c7433756d44233/d7a3b5d1502e206b-59/s99999x99999/93ac5653f9471b1dcdabdaf4e1682be66cb4c0ab.jpg
https://64.media.tumblr.com/151edf2d063d226e0205dd635608903e/df7243413998c01e-32/s99999x99999/d824ef6cd7a2c7a7de601201fc7dea46f7b49d33.jpg
https://64.media.tumblr.com/938fdb9a62701727aa469c38438eed88/898dbdeff0499f5d-db/s99999x99999/b6211dae8f677b2a334a7497c17c60656a0d9f3a.gif
https://64.media.tumblr.com/fbd624637fea20b903933d49db5f93a8/90abd8c5e75a8b75-23/s99999x99999/c5cb2c3eca15d163cbebad9c0536a493bdc54738.png
https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg
https://64.media.tumblr.com/3c317583ed7a6a9c73210c597e4dfd7c/268ee4cc47ebc885-e9/s99999x99999/b11b9f656d25a0143585df316d4d3b1747293ef2.jpg
https://64.media.tumblr.com/6ba8711aad09b7cc62dddf3461906caf/de32b6115f021be2-4c/s99999x99999/4ca280458b2cef9f27db3509ccf158079925081d.jpg
https://64.media.tumblr.com/2f543ec0fba9225492f93bf43973e8d6/64d5711bb0f1e37d-26/s99999x99999/f6f68c36d113183d92c20092b9fa77ad23194bde.jpg
https://64.media.tumblr.com/e83e00bf88b67fe50e42c618d2a3eb84/4d5859ff43302fe5-3b/s99999x99999/472fb2f380a520454ab84cddd128441f02ce4738.jpg
https://64.media.tumblr.com/d74ad19549e03983e86b8c7fbff45480/5b004e3ccaa28205-42/s99999x99999/79b4daf0adebae04e7a18f7a6866e6226d98f219.jpg
https://64.media.tumblr.com/a842463bd0f1b35c354fea19f6b1c2cd/ebb2c2cfac192196-0c/s99999x99999/0ffbed6284c703a40e1cd77d1d138ce5a910738e.jpg
https://64.media.tumblr.com/3fbf03b09b1bcddb1825d00f609be057/2f7dd023d85c1ba7-ff/s99999x99999/d6a07477e8c5016a2bdec2fb6f22005a7738bf5c.jpg
https://64.media.tumblr.com/a941d512a456f4b0634f936bb4995a49/20a640ade20957d0-b8/s99999x99999/ef30dc6f53a6c3fbfa95d3a49ab7e1cb9f66a925.jpg
https://64.media.tumblr.com/310824365264261114f845d1c2106ddb/0e0aae5a919a6e72-73/s99999x99999/b0445beadf61e3d596cd02f8a773b1c9801f4247.jpg
https://64.media.tumblr.com/4438eeafe852de0d197f9e6c99df717b/8356e2430793ebf0-19/s99999x99999/c6fd48f59ef4d62cf70a57ddfb1222e4caa45b03.jpg
https://64.media.tumblr.com/9f0c9e529ef3322445f262aff8aaa746/caa399ab027fc306-47/s99999x99999/2028216b74daf58a9042ed8d612705821f165ce8.jpg
https://64.media.tumblr.com/e17fed609d69e2a8c41f62a626af03ba/cfcf3012569042bb-08/s99999x99999/27f393e64cafd7cfe2c3a2521f3e30a3ba50cc99.jpg
https://64.media.tumblr.com/4b0387d6c13a80c87cc8065a2efc28b0/f6117b815c977b53-ba/s99999x99999/02a391ab1951edd28276bae4610ad77d92d730a5.jpg
https://64.media.tumblr.com/8ff9d0767285060667ca05f51f2df290/c7ae62ec77bf10de-f2/s99999x99999/ae3de9437c841db6f8ccb177fc6b2cd3ec1ebc65.jpg
https://64.media.tumblr.com/89a971802225138da8f2f083e455750e/9df0c1a2b29a1d0e-87/s99999x99999/49c93f757f442e743a1c4d578ee2031ce66a6fe0.jpg
https://64.media.tumblr.com/eae71beae352e8af75364dd6f28b3429/989a3a36f8336d12-1e/s99999x99999/1681d381b6d77f6eda7fcdca8dcedbb341c077e0.jpg
The only thing remarkable here, is that contrary to my earlier tests in this thread, all of these (except 4) did not work in a clean browser profile this time. They only seem to work for me when I use my normal browser profile, where I'm signed in to Tumblr.
Still no dice with curl either, even if I'm setting an "Accept"
header like this
curl -H "Accept: */*" -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" --verbose https://64.media.tumblr.com/89a971802225138da8f2f083e455750e/9df0c1a2b29a1d0e-87/s99999x99999/49c93f757f442e743a1c4d578ee2031ce66a6fe0.jpg
Regarding the URL hi-res replacing, I did not see any difference here in this case. The replaced URL works in the browser, but the image would've been within normal Tumblr size anyway. Might depend on the blog in this case, I assume here it's only stuff circulating on Tumblr without any other material, this might explain why.. I'd like to see an example where this "upgrade" would work, but they are probably pretty rare, I think?
And Tumblr being Tumblr, all of the content from the logged URLs here have been duplicates as well anyway, already downloaded by the initial blog grab.. Well, except for one picture maybe, but don't pin me down to this, maybe this one slipped through and I forgot it or something..
The hardcoded wait time between the fallbacks made an improvement here, so I'm definitely in favor of keeping this. Not sure if the difference would be statistically relevant, but I'd also favor one or two additional fallback attempts, just to be sure, and to keep the actual error rate in the end as low as possible, but that's up to you to decide @mikf
The only other "solution" that would help in such a case as described here is to log the actual post URLs causing any HTTP error, as I've mentioned in some earlier comment. Because the logfile could then be basically reused as an input file, in effect feeding all "leftover" URLs to the TumblrPostExtractor in one step, and pronto, already done.
Yeah, that would be the only real improvement I can think of right now..
Otherwise, I'd be inclined to close this issue as solved because there isn't anything that can be done on the client side anymore, I believe..
I'd like to see an example where this "upgrade" would work, but they are probably pretty rare, I think?
The hardcoded wait time between
It's not hardcoded anymore: https://github.com/mikf/gallery-dl/commit/7c6af27eb8ba193683436f24eb572581c1d1fcfc
The only other "solution" that would help in such a case as described here is to log the actual post URLs causing any HTTP error, as I've mentioned in some earlier comment. Because the logfile could then be basically reused as an input file, in effect feeding all "leftover" URLs to the TumblrPostExtractor in one step, and pronto, already done.
It's only the post ID and not the entire URL, but the number at the end of a final warning message is the ID of the post the failing image is from. For example it'd be 686362190566080512
for
[2022-10-17T10:04:57][warning] Unable to fetch higher-resolution version of https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg (686362190566080512) [Source URL: https://bespokeprovocateur2.tumblr.com/]
so the referenced post is at https://bespokeprovocateur2.tumblr.com/post/686362190566080512
I'd like to see an example where this "upgrade" would work, but they are probably pretty rare, I think?
Thanks for these examples, appreciate it!
The hardcoded wait time between
It's not hardcoded anymore: 7c6af27
I'll take that as well, if you absolutely insist.. 😉
The only other "solution" that would help in such a case as described here is to log the actual post URLs causing any HTTP error, as I've mentioned in some earlier comment. Because the logfile could then be basically reused as an input file, in effect feeding all "leftover" URLs to the TumblrPostExtractor in one step, and pronto, already done.
It's only the post ID and not the entire URL, but the number at the end of a final warning message is the ID of the post the failing image is from. For example it'd be
686362190566080512
for[2022-10-17T10:04:57][warning] Unable to fetch higher-resolution version of https://64.media.tumblr.com/fe0bcf9a518f549f8ff79a9b9c4ed0e2/ce8f987ad5b31ee0-e2/s99999x99999/00e4b06ef8ba90c586b0dd4001e471f43f07d9cb.jpg (686362190566080512) [Source URL: https://bespokeprovocateur2.tumblr.com/]
so the referenced post is at https://bespokeprovocateur2.tumblr.com/post/686362190566080512
Well.. you're right, obviously. The Post ID is enough to be able to always reconstruct the URL. Damn, I must've missed this somehow. Glaring failure in the mark I eyeball device. Too busy staring at the timestamps and at all those numbers in the URLs, I guess. The error line contains the actual ID too, in the filename..
[2022-10-17T10:04:57][error] Failed to download 686362190566080512_01.jpg [Source URL: https://bespokeprovocateur2.tumblr.com/]
Yeah, reading does help. Should've looked at the actual source itself, because even I understand enough of Python for this..
Okay, knowing that I can extract those IDs from the log is all that's really needed here.. All fine with me, closing this issue as solved now, thanks again!
Been trying Tumblr extraction for the first time since, well, last year or so, and encountered some strange 404 error causing blog extraction to halt.
From my logfile:
What's a bit strange to me here: This URL works in the browser, even in incognito/empty profile.
Please note, contrary to the URL itself which indicates a direct link to a JPG file, I end up redirected in the browser. (This has been a redesign of their site, not too long ago. Direct URLs like this used to work)
"View Source" of that redirection page from Firefox private tab
```
Fun fact, if I do the usual
Copy Image Link
orView Image
from the context menu, I'll get the actual JPG image. The URL is nowIt's not the same URL, the path segment in the URL now ends in
c76d6df4266c74173985c757304a2a9bf214859b
(right before.jpg
) Would be interesting to know if this URL could be directly derived from the original URL that ends up in the log..While working in a browser (tested here with Firefox and Chrome), it actually does not work with curl, even when changing the User-Agent, it'll simply show this 404 error...
Full curl output
``` PS D:\Temp> curl -A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:105.0) Gecko/20100101 Firefox/105.0' 'https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg' --verbose * Trying 192.0.77.3:443... * Connected to 64.media.tumblr.com (192.0.77.3) port 443 (#0) * ALPN: offers h2 * ALPN: offers http/1.1 * CAfile: E:\Apps\curl\curl-ca-bundle.crt * CApath: none * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS header, Finished (20): * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.2 (OUT), TLS header, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS header, Supplemental data (23): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 * ALPN: server accepted h2 * Server certificate: * subject: CN=*.media.tumblr.com * start date: Jan 17 00:00:00 2022 GMT * expire date: Jan 17 23:59:59 2023 GMT * subjectAltName: host "64.media.tumblr.com" matched cert's "*.media.tumblr.com" * issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA * SSL certificate verify ok. * Using HTTP2, server supports multiplexing * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * TLSv1.2 (OUT), TLS header, Supplemental data (23): * TLSv1.2 (OUT), TLS header, Supplemental data (23): * TLSv1.2 (OUT), TLS header, Supplemental data (23): * h2h3 [:method: GET] * h2h3 [:path: /1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg] * h2h3 [:scheme: https] * h2h3 [:authority: 64.media.tumblr.com] * h2h3 [user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:105.0) Gecko/20100101 Firefox/105.0] * h2h3 [accept: */*] * Using Stream ID: 1 (easy handle 0x21c3c41f500) * TLSv1.2 (OUT), TLS header, Supplemental data (23): > GET /1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21/s99999x99999/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg HTTP/2 > Host: 64.media.tumblr.com > user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:105.0) Gecko/20100101 Firefox/105.0 > accept: */* > * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * old SSL session ID is stale, removing * TLSv1.2 (IN), TLS header, Supplemental data (23): * Connection state changed (MAX_CONCURRENT_STREAMS == 128)! * TLSv1.2 (OUT), TLS header, Supplemental data (23): * TLSv1.2 (IN), TLS header, Supplemental data (23): * TLSv1.2 (IN), TLS header, Supplemental data (23): < HTTP/2 404 < server: nginx < date: Thu, 22 Sep 2022 13:42:22 GMT < content-type: text/plain < content-length: 14 < etag: "62f10f82-e" < x-nc: EXPIRED hhn 1 < access-control-allow-methods: GET < access-control-allow-origin: * < access-control-max-age: 86400 < strict-transport-security: max-age=31536000; preload < server-timing: dc;desc=hhn, cache;desc=EXPIRED;dur=120.0 < 404 Not Found * TLSv1.2 (IN), TLS header, Supplemental data (23): * Connection #0 to host 64.media.tumblr.com left intact PS D:\Temp> ```So, if anyone knows the missing headers or something, please share here..
One more thing, the resulting redirected page from the browser seems to hint at this post URL here:
Which is not from the blog I've used with gallery-dl, because that would be
so it seems the logged URL from gallery-dl is already a step into the extraction process and following a reblog here?
I've checked the
/archive
of the target blog and compared with the IDs of already downloaded files to confirm the order and check whether there would be anything unusual about this specific post, but it seems perfectly normal, straight from the middle.The post URL seems to be:
Short Intermission here: I'm using
{extractor.url}
in the log settings, which always gives the default blog URL, i.e. the "input" URL as fed to gallery-dl, is it possible to directly log the offending post URL somehow? If it's not possible, maybe consider this a feature request..As you may have guessed, using this post URL with my normal config actually works. Even better:
So, I'm afraid, and a little bit at loss as how to reliable reproduce this. I've made this test twice now, including deleting gallery-dl's cache file and archive file for tumblr etc., but it tripped over two different postings at each attempt, unfortunately..
I also checked with
https://api.tumblr.com/console/
, the output is as follows:https://api.tumblr.com/console/calls/blog/info
:JSON
```JSON { "meta": { "status": 200, "msg": "OK" }, "response": { "blog": { "ask": true, "ask_anon": true, "ask_page_title": "Ask", "asks_allow_media": true, "avatar": [ { "width": 512, "height": 512, "url": ["https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s512x512u_c1/cc2663fea6ccc0e5eb0164977dedb3e05f102133.png"](https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s512x512u_c1/cc2663fea6ccc0e5eb0164977dedb3e05f102133.png) }, { "width": 128, "height": 128, "url": ["https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s128x128u_c1/476a1836ec3d368494e897c7daf3b6c38fd3687e.png"](https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s128x128u_c1/476a1836ec3d368494e897c7daf3b6c38fd3687e.png) }, { "width": 96, "height": 96, "url": ["https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s96x96u_c1/3cc11b810cb9696e8786cf5a967c332762897828.png"](https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s96x96u_c1/3cc11b810cb9696e8786cf5a967c332762897828.png) }, { "width": 64, "height": 64, "url": ["https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s64x64u_c1/58ef4f69c1766c5b51a7adb09167048b844868f4.png"](https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s64x64u_c1/58ef4f69c1766c5b51a7adb09167048b844868f4.png) } ], "can_chat": false, "can_send_fan_mail": false, "can_submit": true, "can_subscribe": true, "description": "TX/30M/🔓
", "followed": true, "is_blocked_from_primary": false, "is_nsfw": false, "name": "bespokeprovocateur2", "posts": 21994, "share_likes": false, "submission_page_title": "Submit", "submission_terms": { "accepted_types": [ "text", "photo", "quote", "link", "video" ], "tags": [ "submission" ], "title": "Submit", "guidelines": "" }, "subscribed": false, "theme": { "header_full_width": 3000, "header_full_height": 1055, "avatar_shape": "square", "background_color": "#000000", "body_font": "Helvetica Neue", "header_bounds": "", "header_image": ["https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png"](https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png), "header_image_focused": ["https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png"](https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png), "header_image_poster": "", "header_image_scaled": ["https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png"](https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png), "header_stretch": true, "link_color": "#b6b4b4", "show_avatar": true, "show_description": true, "show_header_image": false, "show_title": true, "title_color": "#ffffff", "title_font": "Bodoni Recut FS", "title_font_weight": "regular" }, "title": "Bespoke Provocateur", "total_posts": 21994, "updated": 1663775872, "url": ["https://bespokeprovocateur2.tumblr.com/"](https://bespokeprovocateur2.tumblr.com/), "uuid": "t:hhVcrhXzU-KAarAWLUa1Cg" } } } ```So, total post count is 21994 at the moment, so it's not that small of a blog, unfortunately.
https://api.tumblr.com/console/calls/blog/posts
withID = 686406675559841792
:API Console does not seem to work either here... But, for what's it worth, I get the same 404 result for all posts here. Even tried it with my own test blog, but it's the same error here as well, so either I'm using Tumblr's API console wrong, or it's actually not working as implied..
gallery-dl and Python versions:
Using my own OAuth v1 API key of course, i.e. I see this message in my verbose log: https://github.com/mikf/gallery-dl/blob/583bee77257a054aba483428525c797a055d2b54/gallery_dl/oauth.py#L126
So it's probably not related to authentication?