zrashwani / arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
MIT License
253 stars 60 forks source link

tel: links get crawled #16

Closed staabm closed 7 years ago

staabm commented 8 years ago

Despite beeing blacklisted in checkIfCrawlable tel links get crawled.

tested via

$crawler = new \Arachnid\Crawler('https://www.handyflash.de/', 3);
$crawler->traverse();

in the apache accesslog a hit like

www.handyflash.de:443 213.XXX.YYY.ZZZ - - [10/Jun/2016:12:24:42 +0200] "GET /tel:+4923199778877 HTTP/1.1" 404 37312 "-" "Symfony2 BrowserKit" 0

is recorded

zrashwani commented 7 years ago

This bug is solved now, verified in this test https://github.com/codeguy/arachnid/blob/master/tests/src/CrawlerTest.php#L381

codeguy commented 7 years ago

@zrashwani I don't have time to maintain this project. Would you like me to transfer ownership of it to you?

zrashwani commented 7 years ago

@codeguy yes, I am planning to continue maintaining the project, so please transfer ownership to me