spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
https://freek.dev/308-building-a-crawler-in-php
MIT License
2.51k stars 357 forks source link

Honeypot #380

Closed mitmelon closed 2 years ago

mitmelon commented 2 years ago

Can this crawler detect honeypots links?

freekmurze commented 2 years ago

No.

mitmelon commented 2 years ago

No. Can you explain to me how the this crawler extracts link from a page for crawling may be i could write a function to detect if the link extracted is a honeypot or not before the crawler crawls it... Or just show me the function that does that in the library

freekmurze commented 2 years ago

You get the content of a crawled link through the crawled method of an observer: https://github.com/spatie/crawler#usage

There, you can do with the $response whatever you want. You might find symfony's domcrawler component handy to get some nodes you are looking for in the html of the response.

mitmelon commented 2 years ago

symfony's domcrawler component

Thank you very much for the answer... Now i get the whole idea