sarperavci / CloudflareBypassForScraping

A cloudflare verification bypass script for webscraping
396 stars 72 forks source link

Can you provide a Docker version #17

Open shaobeipan opened 1 month ago

frederik-uni commented 1 month ago

I can't test it right now, but it should work. the path would need to be changed to chromium-browser instead of google-chrome in the server.py file.

FROM ubuntu:latest

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    xvfb \
    chromium-browser \
    && apt-get clean 

WORKDIR /app

COPY . .

RUN pip3 install --no-cache-dir -r requirements.txt

RUN pip3 install --no-cache-dir -r server_requirements.txt

EXPOSE 8000

CMD ["python3", "server.py", "--headless"]

does this work?

shaobeipan commented 1 month ago

http://localhost:8000/cookies?url=https://nopecha.com/demo/cloudflare After accessing, the following error message appears

{ Details ":" \ n12.7.0.0.1:9222 Browser cannot link. \ n Please confirm: \ n 1. The port is a browser. \ n 2. The '- remote debugging port=9222' startup option has been added. \ n 3. The user folder does not conflict with the opened browser. \ n 4. If it is a system without an interface, please add the '- less=new' parameter. \ n 5. If it is a Linux system, the '- no sandbox' startup parameter may also need to be added. \ n The port and user folder path can be set using Chromium Options }

I tried to add these parameters, but the problem still persists

frederik-uni commented 4 weeks ago

it seems like this is a problem with DrissionPage. My suggestion would be to create an issue there

gandrunx commented 4 weeks ago

Screenshot 2024-08-15 154714

27 It could be due to my IP being flagged as spam or a potential configuration error.

HenryXiaoYang commented 3 weeks ago

我在docker环境中,不管是有头还是无头都报错。应该是与DrissionPage启动浏览器的方式有关。真不知道这个项目的docker容器怎么跑起来的qwq

    self.driver = ChromiumPage(addr_or_opts=options)
  File "/usr/local/lib/python3.10/dist-packages/DrissionPage/_pages/chromium_page.py", line 38, in __new__
    is_exist, browser_id = run_browser(opt)
  File "/usr/local/lib/python3.10/dist-packages/DrissionPage/_pages/chromium_page.py", line 317, in run_browser
    is_exist = connect_browser(chromium_options)
  File "/usr/local/lib/python3.10/dist-packages/DrissionPage/_functions/browser.py", line 56, in connect_browser
    test_connect(ip, port)
  File "/usr/local/lib/python3.10/dist-packages/DrissionPage/_functions/browser.py", line 217, in test_connect
    raise BrowserConnectError(f'\n{ip}:{port}浏览器无法链接。\n请确认:\n1、该端口为浏览器\n'
DrissionPage.errors.BrowserConnectError: 
127.0.0.1:12049浏览器无法链接。
请确认:
1、该端口为浏览器
2、已添加'--remote-debugging-port=12049'启动项
3、用户文件夹没有和已打开的浏览器冲突
4、如为无界面系统,请添加'--headless=new'参数
5、如果是Linux系统,可能还要添加'--no-sandbox'启动参数
可使用ChromiumOptions设置端口和用户文件夹路径。
HenryXiaoYang commented 3 weeks ago

非docker环境,有屏幕输出都没问题。放到docker里无屏幕输出就一直有问题

HenryXiaoYang commented 3 weeks ago

https://github.com/g1879/DrissionPage/issues/54#issuecomment-1773000503

这个issue对于解决此问题似乎能有帮助

shaobeipan commented 3 weeks ago

g1879/DrissionPage#54 (评论)

这个问题对于解决此问题似乎能有帮助

现在这个项目可以正常在docker跑。只是无法突破5秒盾,一直循环卡在点击那里