PulsarRPAPro is the professional version of PulsarRPA, featuring an upgraded server, a collection of top e-commerce site scraping examples, and an advanced AI-powered applet for automatic data extraction.
Never write another web scraper. PulsarRPAPro learns from the website and delivers web data completely and accurately at scale.
There are already dozens of scraping cases for the most popular websites, and we are constantly adding more.
Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC
Download the latest executable jar:
wget http://static.platonic.fun/repo/ai/platon/exotic/PulsarRPAPro.jar
# start MongoDB
docker-compose -f docker/docker-compose.yaml up
java -jar PulsarRPAPro.jar
java -jar PulsarRPAPro.jar harvest "https://www.amazon.com/b?node=1292115011" -diagnose -refresh
Add the following lines to your .m2/settings.xml
:
<mirrors>
<mirror>
<id>maven-default-http-blocker</id>
<mirrorOf>dummy</mirrorOf>
<name>Dummy mirror to override default blocking mirror that blocks http</name>
<url>http://0.0.0.0/</url>
</mirror>
</mirrors>
git clone https://github.com/platonai/PulsarRPAPro.git
cd PulsarRPAPro
./mvnw clean && ./mvnw
cd PulsarRPAPro/target/
# Don't forget to start MongoDB
docker-compose -f docker/docker-compose.yaml up
For Chinese developers, we strongly suggest following this guide to accelerate the build process.
java -jar PulsarRPAPro.jar serve
If PulsarRPAPro is running in GUI mode, the web console should open within a few seconds, or you can open it manually at:
http://localhost:2718/exotic/crawl/
You can use the harvest
command to learn from a set of item pages using unsupervised machine learning.
java -jar PulsarRPAPro.jar harvest "https://www.amazon.com/b?node=1292115011" -diagnose -refresh
The URL in the command should be a portal URL, such as a product listing page URL.
PulsarRPAPro will visit the portal, identify the optimal set of links for item pages, retrieve those pages, and analyze them.
Here is the full page of the auto extraction result in HTML format:
Auto Extraction Result of Amazon
Run the executable jar directly for help and to explore more features:
java -jar PulsarRPAPro.jar
This command will print the help message and some of the most useful examples.
Q: How to use proxies?
A: Follow this guide for proxy rotation.