Improvement suggestions

wachterjohannes commented 9 years ago

We from sulu-cmf wants to use your crawler to create a http cache warmer and website information extractor. I will start today to use your class in a new SymfonyBundle.

Because of this reason i would like to ask you if you had time to contribute to your class?

I will create a PR to include some improvements we need:

Extract metadata
Get status_code of external links to check broken links
Perhaps the possibility to add a "progress bar"

I hope you will be able to merge this PR and thanks to your good work until now (= it saves me a lot of time.

With best regards sulu-cmf

wachterjohannes commented 9 years ago

link to our homepage and github-account

wachterjohannes commented 9 years ago

also return 301 status code to determine which urls are redirects!

could be solved with:

$client = new Client();
$client->followRedirects(false);

$crawler = $client->request('GET', $url);
if ($client->getResponse()->getStatus() >=300 && $client->getResponse()->getStatus() <= 400) {
    $crawler = $client->followRedirect();
}

save statuscode for first url.

wachterjohannes commented 9 years ago

add the page where the link is used to result.

zrashwani / arachnid

Improvement suggestions #7