spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
https://freek.dev/308-building-a-crawler-in-php
MIT License
2.51k stars 357 forks source link

Custom/extendable `CrawlUrl` #429

Closed rudiedirkx closed 1 year ago

rudiedirkx commented 1 year ago

I want to keep detailed track of crawled pages: number of references, response content-type & http code, etc. I can keep those in my own list of crawled URL objects, but that's A LOT of redundancy. Even in the current queue every URL is saved 3 times: array key, CrawlUrl->url, CrawlUrl->id. I don't want to add even more, but I do want to add a few stats per URL. With an custom/extendable CrawlUrl I could add those efficiently.

I haven't actually tried to keep track of references yet. Is that possible? I want to know how many pages link to /contact.html, or /help/bla.html, or /files/bestpdf.pdf etc.


Extendable? Extendible? Extensible? You know.