mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[Patreon] 403s and 401s #6348

Open SpaceAceMonkey opened 5 days ago

SpaceAceMonkey commented 5 days ago

I am having an issue similar to the issues described in #5731, #6140, #6239, and #6241. Requests to https://www.patreon.com/CreatorName, and https://www.patreon.com/CreatorName/posts always result in 403 errors after CloudFlare challenges.

None of the solutions suggested in those issues work for me. Changing browser options in order to modify the fingerprint doesn't work. Using different browsers doesn't work. Using the Patreon application user agent doesn't work.

I have tried using both the latest stable version and 1.27.7-dev.

The only thing that sort of works is pointing gallery-dl to https://www.patreon.com/home as mentioned in #6140 , which seems to bypass the CloudFlare challenge, but which then results in a 401 response for each attempted download. I have tried using cookies from both Brave and Firefox, both of which are logged in to my Patreon account. Even if this worked, I'm not sure if it would get all posts from the creator I am interested in.

[cookies][info] Extracted 2938 cookies from Firefox

[cookies][info] Extracted 2179 cookies from Brave

There are no reports of failure to decrypt any of the cookies with either browser. Initially, that was an issue with Brave, but once I specified the keyring, that problem disappeared.

I have also tried leaving 12+ hours between attempts as suggested by someone in one of the linked issues.

I don't know if it was mentioned in those other issues, but the Patreon URLs you see in your browser address bar include a /c/ portion that gallery-dl doesn't seem to expect. If you include the /c/ as in https://www.patreon.com/c/CreatorName/posts, gallery-dl reports that it cannot find that creator. Presumably, it thinks the creator is c. I imagine this is handled behind the scenes by 30x redirects, but it seemed worth mentioning.

I am using ubuntu 24.04.

mikf commented 4 days ago

Since https://www.patreon.com/home works, maybe you are "allowed" to do API requests, but can't access the website itself. Could you try using https://www.patreon.com/id:CAMPAIGN_ID as URL (e.g. https://www.patreon.com/id:2367430)? You can find the CAMPAIGN_ID of a creator in the HTML source by searching for campaign.

but which then results in a 401 response for each attempted download

I really don't know what to do about that or why it happens in the first place. Maybe --print-traffic could help with debugging.

SpaceAceMonkey commented 3 days ago

Hi. After trying the CAMPAIGN_ID solution, I went back to trying /creator/posts. The 403s seem to have disappeared when using either the campaign or creator routes, but the 401s remain.

I switched from having gallery-dl extract the cookies to specifying a cookie file in the Mozilla format, but I still get nothing but 401s.

Below is a partial example of the cookie file I am using.

.patreon.com TRUE / FALSE 12345678 patreon_device_id GUID_GOES_HERE
.patreon.com    TRUE    /       TRUE    12345678      session_id SESSION_ID_GOES_HERE
...

I have no problems using patreon-dl with the same cookies. With patreon-dl, the cookies are supplied as a string on the command-line.

I am confident that gallery-dl is reading the cookies, because I don't get the no 'session_id' cookie set warning I get when the cookies are not properly specified.

The --print--traffic flag isn't showing anything odd except possibly the Report-To: CloudFlare header. I am guessing that isn't the source of the problem. The rest is normal 30x responses leading to the 401s.