zrashwani / arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
MIT License
253 stars 60 forks source link

filterLinks Issue #31

Closed asadrabbi closed 5 years ago

asadrabbi commented 5 years ago

<?php require 'vendor/autoload.php';

$url = "https://www.google.com/";
$linkDepth = 2;
$crawler = new \Arachnid\Crawler($url,$linkDepth);
$links = $crawler->filterLinks(function($link){
                    return (bool) preg_match('/\/google\/(.*)/',$link)
                })
                ->traverse()
                ->getLinksArray();

print_r($links);

I have written this code to traverse only which have google as domain name, but it returns an empty array. What am I missing??

zrashwani commented 5 years ago

What is the base url that you are trying to crawl? The library don't crawl external urls with host is different than base host, it may be the reason

Can you post the whole script so I can take a look?