Closed crawlersgonnacrawl closed 2 years ago
Try some e-commerce site and run:
java -jar exotic-standalone.jar harvest a-product-list-url-of-you-e-comm-website
The url in the command above should be a portal url, for example, the url of a product list page. Exotic visits the portal url, finds out the best out link set of item pages, fetches the item pages and then learn from them.
I have tried to run this, it worked for 30-40 seconds and program is closed. At this time HTOP was full of processes and I can see that it was working.
root@exotic-test:~# java -jar exotic-standalone.jar harvest https://www.trendyol.com/apple-cep-telefonu-x-b101470-c103498
How can I see the result? Where it is stored?
Here is my public link to GUI: http://5.161.58.104:2718/exotic/crawl/ (I'll delete this later)
How can I see the result? Where it is stored?
Once the system successfully completes the task, a webpage will be open automatically to show the harvest result.
Unfortunately, it does not as this is a remote machine. Any chance to return remote link as a return from CLI? If not so, I can't run from a remote machine.
There are three ways to run harvest and check the results:
Run command in CLI, the results are written in files in three different formats:
java -jar exotic-standalone.jar harvest https://www.trendyol.com/apple-cep-telefonu-x-b101470-c103498
less "/tmp/pulsar-$USER/report/harvest/corpus/last-page-tables.json"
Run X-SQL in CLI, the results are returned in tabular form:
java -jar exotic-standalone.jar sql "select * from harvest('https://www.trendyol.com/apple-cep-telefonu-x-b101470-c103498')"
Acess the REST API with X-SQL, the results are returned in json form:
curl -X POST --location "http://5.161.58.104:2718/exotic/x/e" -H "Content-Type: text/plain" -d "
select * from harvest('https://www.trendyol.com/apple-cep-telefonu-x-b101470-c103498')
"
Project is really promising - thanks for hard work! I have finally run the app.
My main interest is just about testing our auto parse feature as you have shown in your website as demo: http://platonic.fun/i/ai?url=aHR0cHM6Ly93d3cuYW1hem9uLmNvbS9CZXN0LVNlbGxlcnMtQXV0b21vdGl2ZS96Z2JzL2F1dG9tb3RpdmUvcmVmPXpnX2JzX25hdl8w
I have tried to create something on a demo site but GUI ask me to provide SQL for parsing rules that includes selector, but I just need harvest mode:
I have tried to use this rule:
select * from harvest('https://ifconfig.me');
Project is created but it stuck on status screen as
running
Then I have tried to run from CLI:
java -jar exotic-standalone.jar harvest https://ifconfig.me
The program is completed successfully but never get any prompt from CLI. Can't see any data in GUI.
How can I create a report like you have created for demo in your website? I can't code in Kotlin yet, just using bash and GUI to use harvest mode but could not get any results.