如何使用headless chrome进行采集？

platonai / PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.

Apache License 2.0

778 stars 118 forks source link

Open Vickzhang opened 1 year ago

Vickzhang commented 1 year ago

疑问：

谢谢。

platonai commented 1 year ago

支持 linux 服务器版进行部署采集。

浏览器安装：

git clone https://github.com/platonai/pulsar.git
cd pulsar && bin/build-run.sh

浏览器设置：

用 BrowserSettings 设置，譬如：

BrowserSettings.privacy(3).maxTabs(10).headless()

这段代码告诉系统，

ScalaFirst commented 1 year ago

@platonai 无头模式爬取 www.temu.com 的数据失败，非无头模式才能访问。请问是否会被检测出无头模式？有没有类似selenium 给谷歌浏览器添加 excludeSwitches 参数的方式来规避检测？