spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
https://freek.dev/308-building-a-crawler-in-php
MIT License
2.51k stars 357 forks source link

Crawler stops after `www`-version redirect to non-`www` version #450

Closed buismaarten closed 8 months ago

buismaarten commented 9 months ago

While crawling a page and a link starting with 'www' is found. And the website redirects from 'www' to non-'www', the crawler simply stops. This is caused by the code below.

if (! $this->crawler->getCrawlProfile() instanceof CrawlSubdomains) {
    if ($crawlUrl->url->getHost() !== $this->crawler->getBaseUrl()->getHost()) {
        return;
    }
}

Note: the Guzzle option allow_redirects must be set to true to follow redirects.