patrickkfkan / patreon-dl

Patreon Downloader
116 stars 8 forks source link

Cannot download initial page : Cloudflare captcha #25

Open lautriva opened 5 months ago

lautriva commented 5 months ago

Hi, patreon-dl was working fine until this week I have an error (see below)

After dumping the resulting html I got something like <!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title></body></html> Which made me think the script got locked by the Cloudflare captcha

Maybe it should use the provided cookie value to bypass the captcha?

Complete log

Jun 18 19:40:56: info: PostDownloader: Targeting posts by 'CREATOR_NAME'
Jun 18 19:40:56: debug: PostsFetcher: Fetch initial data from "https://www.patreon.com/CREATOR_NAME"
Jun 18 19:40:56: debug: PostsFetcher: next() requested (0)
Jun 18 19:40:56: debug: PageParser: Parse initial data from https://www.patreon.com/CREATOR_NAME
Jun 18 19:40:56: debug: PageParser: Trying pattern: /window\.patreon\s*?=\s*?({.+?});/gm
Jun 18 19:40:56: debug: PageParser: No match for pattern: /window\.patreon\s*?=\s*?({.+?});/gm
Jun 18 19:40:56: debug: PageParser: Trying pattern: /<script id="__NEXT_DATA__" type="application\/json">(.+)<\/script>/gm
Jun 18 19:40:56: debug: PageParser: No match for pattern: /<script id="__NEXT_DATA__" type="application\/json">(.+)<\/script>/gm
Jun 18 19:40:56: error: PostsFetcher: Error parsing initial data from "https://www.patreon.com/CREATOR_NAME": Initial data not found - no regex matches
Jun 18 19:40:56: debug: PostsFetcher: next() handled (0)
Jun 18 19:40:56: info: PostDownloader: Done downloading posts by 'CREATOR_NAME'
Jun 18 19:40:56: info: PostDownloader: Total 0 / undefined posts processed
Jun 18 19:40:56: info: PostDownloader end
jmurchie88 commented 5 months ago

Ran into the same problem today, I initially attempted running patreon-dl from a remote server from where the cookie was generated and was met with the issue @lautriva is describing. Went back to running the project from my local machine (same IP/geography the cookie was generated from) and everything worked as expected. It seems they may be performing some level of locking the cookies to an IP or geography. Not sure this is a problem the project can solve but wanted to provide my anecdote.

patrickkfkan commented 2 months ago

Thanks for the report and feedback. I don't think this is something patreon-dl can / should deal with. Perhaps having a script that runs a headless browser on the remote end to obtain a cookie from the Patreon website would suffice in circumventing the Cloudflare restriction, but I'm not prepared to go that route...

lautriva commented 2 months ago

Hello, sorry for late response My problem is absolutely not related to any remote server All is done from my home (same IP address)

I login to Patreon and generate the cookie from my standard computer Then I'm running patreon-dl from a local server (still in my home)

If it helps here are how I start patreon-dl and my config file

patreon-dl -C creator_name.conf

[downloader]
# URL of content to download
# You can specify multiple URLs by separating them with a comma.
# Alternatively, you can use a file to supply URLs. In this case, you would
# provide the path to the file here. The file should contain a list of the
# target URLs, each in its own line, along with any target-specific 'include'
# config. See project documentation for example.
target.url = https://www.patreon.com/creator_name/posts

# Cookie to include in requests; required for accessing 
# patron-only content
# https://github.com/patrickkfkan/patreon-dl/wiki/How-to-obtain-Cookie
cookie = __cf_bm=[...REDACTED...]xxx

[output]
# Path to directory where content is saved
# Default: current working directory
out.dir = /path/to/patreon/posts
campaign.dir.name.format = {campaign.name}

[embed.downloader.youtube]

# Set the command to execute. Fields enclosed in curly braces will be
# replaced with actual values at runtime. Available fields:
#
# `post.id`: ID of the post containing the embedded video
# `embed.provider`: name of the provider
# `embed.provider.url`: link to the provider's site
# `embed.url`: link to the video page supplied by the provider
# `embed.subject`: subject of the video
# `embed.html`: the HTML code that embeds the video player on the Patreon page
# `dest.dir`: the directory where the video should be saved
# 
# So, here, yt-dlp will download the video at 'embed.url' and save it in
# 'dest.dir'. The filename will be determined by the format "%(title)s.%(ext)s"
# (see: https://github.com/yt-dlp/yt-dlp?tab=readme-ov-file#output-template).

exec = yt-dlp -o "{dest.dir}/%(title)s.%(ext)s" "{embed.url}"

# Example: Vimeo
# Out of the box, 'patreon-dl' does not support downloading Vimeo content.
# They are also slightly more complicated to handle than YouTube embeds, since
# 'embed.url' is not always accessible (depends on the embed method used).
# For this purpose, we have a script that you could use to ease the downloading
# process (beta - no guarantees).

[embed.downloader.vimeo]

# See project source 'bin/patreon-dl-vimeo.js' for full usage
exec = patreon-dl-vimeo -o "{dest.dir}/%(title)s.%(ext)s" --embed-html "{embed.html}" --embed-url "{embed.url}"
patrickkfkan commented 1 month ago

@lautriva , you commented out the cookie line with a #. Could this be the reason?

lautriva commented 1 month ago

@lautriva , you commented out the cookie line with a #. Could this be the reason?

Not the reason, sorry it was just a copy-paste error

Updated the example file contents

lautriva commented 5 hours ago

Hi, Are there any news?