mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[Patreon] Downloading a user profile repeatedly interrupted #3624

Open MarqFJA87 opened 1 year ago

MarqFJA87 commented 1 year ago

I'm trying to download https://www.patreon.com/haganef using cookies, and the process runs normally until it reaches a certain post, whereupon it gives an error and stops.

The verbose log starts with the following:

[gallery-dl][debug] Version 1.24.5
[gallery-dl][debug] Python 3.10.0 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.26.0 - urllib3 1.26.7
[gallery-dl][debug] Configuration Files ['%USERPROFILE%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://www.patreon.com/haganef'
[patreon][debug] TLS 1.2 disabled.
[patreon][debug] Using PatreonCreatorExtractor for 'https://www.patreon.com/haganef'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.patreon.com:443
[urllib3.connectionpool][debug] https://www.patreon.com:443 "GET /haganef/posts HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://www.patreon.com:443 "GET /api/posts?include=campaign,access_rules,attachments,audio,images,media,native_video_insights,poll.choices,poll.current_user_responses.user,poll.current_user_responses.choice,poll.current_user_responses.poll,user,user_defined_tags,ti_checks&fields%5Bcampaign%5D=currency,show_audio_post_download_links,avatar_photo_url,avatar_photo_image_urls,earnings_visibility,is_nsfw,is_monthly,name,url&fields%5Bpost%5D=change_visibility_at,comment_count,commenter_count,content,current_user_can_comment,current_user_can_delete,current_user_can_view,current_user_has_liked,embed,image,insights_last_updated_at,is_paid,like_count,meta_image_url,min_cents_pledged_to_view,post_file,post_metadata,published_at,patreon_url,post_type,pledge_url,preview_asset_type,thumbnail,thumbnail_url,teaser_text,title,upgrade_url,url,was_posted_by_campaign_owner,has_ti_violation,moderation_status,post_level_suspension_removal_date,pls_one_liners_by_category,video_preview,view_count&fields%5Bpost_tag%5D=tag_type,value&fields%5Buser%5D=image_url,full_name,url&fields%5Baccess_rule%5D=access_rule_type,amount_cents&fields%5Bmedia%5D=id,image_urls,download_url,metadata,file_name&fields%5Bnative_video_insights%5D=average_view_duration,average_view_pct,has_preview,id,last_updated_at,num_views,preview_views,video_duration&filter%5Bcampaign_id%5D=305142&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&sort=-published_at&json-api-version=1.0 HTTP/1.1" 200 None

When it hits the error, I get the following:

[urllib3.connectionpool][debug] https://www.patreon.com:443 "GET /api/posts?include=campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&fields%5Bcampaign%5D=currency%2Cshow_audio_post_download_links%2Cavatar_photo_url%2Cavatar_photo_image_urls%2Cearnings_visibility%2Cis_nsfw%2Cis_monthly%2Cname%2Curl&fields%5Bpost%5D=change_visibility_at%2Ccomment_count%2Ccommenter_count%2Ccontent%2Ccurrent_user_can_comment%2Ccurrent_user_can_delete%2Ccurrent_user_can_view%2Ccurrent_user_has_liked%2Cembed%2Cimage%2Cinsights_last_updated_at%2Cis_paid%2Clike_count%2Cmeta_image_url%2Cmin_cents_pledged_to_view%2Cpost_file%2Cpost_metadata%2Cpublished_at%2Cpatreon_url%2Cpost_type%2Cpledge_url%2Cpreview_asset_type%2Cthumbnail%2Cthumbnail_url%2Cteaser_text%2Ctitle%2Cupgrade_url%2Curl%2Cwas_posted_by_campaign_owner%2Chas_ti_violation%2Cmoderation_status%2Cpost_level_suspension_removal_date%2Cpls_one_liners_by_category%2Cvideo_preview%2Cview_count&fields%5Bpost_tag%5D=tag_type%2Cvalue&fields%5Buser%5D=image_url%2Cfull_name%2Curl&fields%5Baccess_rule%5D=access_rule_type%2Camount_cents&fields%5Bmedia%5D=id%2Cimage_urls%2Cdownload_url%2Cmetadata%2Cfile_name&fields%5Bnative_video_insights%5D=average_view_duration%2Caverage_view_pct%2Chas_preview%2Cid%2Clast_updated_at%2Cnum_views%2Cpreview_views%2Cvideo_duration&filter%5Bcampaign_id%5D=305142&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&sort=-published_at&json-api-version=1.0&page%5Bcursor%5D=01mSVz3er1PYNmdx-dKJGAtknJ HTTP/1.1" 403 None
[patreon][error] HttpError: '403 Forbidden' for 'https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&fields%5Bcampaign%5D=currency%2Cshow_audio_post_download_links%2Cavatar_photo_url%2Cavatar_photo_image_urls%2Cearnings_visibility%2Cis_nsfw%2Cis_monthly%2Cname%2Curl&fields%5Bpost%5D=change_visibility_at%2Ccomment_count%2Ccommenter_count%2Ccontent%2Ccurrent_user_can_comment%2Ccurrent_user_can_delete%2Ccurrent_user_can_view%2Ccurrent_user_has_liked%2Cembed%2Cimage%2Cinsights_last_updated_at%2Cis_paid%2Clike_count%2Cmeta_image_url%2Cmin_cents_pledged_to_view%2Cpost_file%2Cpost_metadata%2Cpublished_at%2Cpatreon_url%2Cpost_type%2Cpledge_url%2Cpreview_asset_type%2Cthumbnail%2Cthumbnail_url%2Cteaser_text%2Ctitle%2Cupgrade_url%2Curl%2Cwas_posted_by_campaign_owner%2Chas_ti_violation%2Cmoderation_status%2Cpost_level_suspension_removal_date%2Cpls_one_liners_by_category%2Cvideo_preview%2Cview_count&fields%5Bpost_tag%5D=tag_type%2Cvalue&fields%5Buser%5D=image_url%2Cfull_name%2Curl&fields%5Baccess_rule%5D=access_rule_type%2Camount_cents&fields%5Bmedia%5D=id%2Cimage_urls%2Cdownload_url%2Cmetadata%2Cfile_name&fields%5Bnative_video_insights%5D=average_view_duration%2Caverage_view_pct%2Chas_preview%2Cid%2Clast_updated_at%2Cnum_views%2Cpreview_views%2Cvideo_duration&filter%5Bcampaign_id%5D=305142&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&sort=-published_at&json-api-version=1.0&page%5Bcursor%5D=01mSVz3er1PYNmdx-dKJGAtknJ'
Hrxn commented 1 year ago

Can you isolate the specific post URL (i.e. like https://www.patreon.com/posts/precious-metal-23563293), and see if this can still be reproduced this way?

MarqFJA87 commented 1 year ago

No, unfortunately. It's buried deep within the list of files on the creator's page (the 860th post, counting from the most recent and not counting any of the Patron-exclusive posts since I'm not a Patron yet), and trying to manually browse through said list eventually results in a time-out about "halfway" through, forcing a page refresh.

MarqFJA87 commented 1 year ago

I think I found a lead that could help solve this mystery. Three new posts have been added to this Patreon gallery over the past 3 days, so I moved my previous batch of downloads into a temporary folder and redid the mass-download of the gallery. Lo and behold, the exact same number of posts were downloaded as the previous batch (673), only now the download was interrupted at the post that's third from last in the original batch.

It seems that gallery-dl can only download 673 posts before hitting some sort of "wall" that prevents it from accessing any further posts.

mikf commented 1 year ago

It might be some sort of captcha like in https://github.com/mikf/gallery-dl/issues/3466#issuecomment-1381988225 that gets triggered after a certein number of requests.

You could try your luck with --sleep-request and also check Patreon's responses with --write-pages

MarqFJA87 commented 1 year ago

About --write-pages... What does "write downloaded intermediary pages to files in the current directory" mean? Is it going to create additional text files for each downloaded file?

mikf commented 1 year ago

--write-pages writes the (mostly JSON) data that gets send by Patreon to .txt files in the directory you run gallery-dl from. It is mostly useful for debugging purposes.

For example, running gallery-dl --write-pages https://www.patreon.com/posts/precious-metal-23563293 in /tmp/ creates two files containing the API and/or HTML response from Patreon

In your case, it would write the full content of the 403 response to disk, so you can take a better look at it and know what it actually was that Patreon complained about.

MarqFJA87 commented 1 year ago

Okay, it's done. Attaching the generated files; 600+ were completely empty, so I've omitted them.

01_https_www.patreon.com_haganef_posts.txt 02_https_www.patreon.com_api_posts_include_campaign,access_rules,attachments,audio,images,media,native_video_insights,poll.choices,poll.current_user_responses.user,poll.current_user_responses.choice,poll.curre.txt 03_https_www.patreon.com_api_user_2941319.txt 18_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.txt 32_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.txt 48_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.txt 63_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.txt 76_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.txt 92_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.txt 108_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 123_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 136_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 152_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 169_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 183_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 198_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 213_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 227_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 243_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 259_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 275_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 290_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 305_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 320_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 336_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 351_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 367_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 382_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 398_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 413_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 427_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 443_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 457_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 473_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 486_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 503_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 517_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 532_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 546_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 562_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 576_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 590_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 601_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 611_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 626_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 640_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 652_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 666_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 678_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 689_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 700_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 712_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt 726_https_www.patreon.com_api_posts_include_campaign%2Caccess_rules%2Cattachments%2Caudio%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_response.txt

mikf commented 1 year ago

The last file (726_...) contains a (Cloudflare?) captcha page, most likely triggered by too many requests over time. As I said in https://github.com/mikf/gallery-dl/issues/3624#issuecomment-1424832312, try using --sleep-request with some random interval like 1-3 to force some pauses between HTTP requests, or maybe just plain --sleep.

Visiting the last URL that triggered the 403 in your browser (https://www.patreon.com/api/posts?include=...), solving the captcha, and using exported cookies from that may also work.

HTML ``` html patreon.com

Please enable JS and disable any ad blocker

```
MarqFJA87 commented 1 year ago

Nope, both commands run into the exact same problem at pretty much the exact same "distance" from the most recent post. And trying to open the last URL just gives me an error page with the following message.

{"errors":[{"code":3,"code_name":"ParameterInvalid","detail":"Invalid parameter for 'page[cursor]': Invalid or expired cursor.","id":"59850448-d132-5788-b4bf-8c1d2e88e375","source":{"parameter":"page[cursor]"},"status":"400","title":"Invalid value for parameter 'page[cursor]'."}]}