zrashwani / arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
MIT License
253 stars 60 forks source link

Improvement suggestions #7

Open wachterjohannes opened 9 years ago

wachterjohannes commented 9 years ago

We from sulu-cmf wants to use your crawler to create a http cache warmer and website information extractor. I will start today to use your class in a new SymfonyBundle.

Because of this reason i would like to ask you if you had time to contribute to your class?

I will create a PR to include some improvements we need:

I hope you will be able to merge this PR and thanks to your good work until now (= it saves me a lot of time.

With best regards sulu-cmf

wachterjohannes commented 9 years ago

link to our homepage and github-account

wachterjohannes commented 9 years ago

also return 301 status code to determine which urls are redirects!

could be solved with:

$client = new Client();
$client->followRedirects(false);

$crawler = $client->request('GET', $url);
if ($client->getResponse()->getStatus() >=300 && $client->getResponse()->getStatus() <= 400) {
    $crawler = $client->followRedirect();
}

save statuscode for first url.

wachterjohannes commented 9 years ago

add the page where the link is used to result.