zrashwani / arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
MIT License
253 stars 60 forks source link

Parent > Children > Grand Children #32

Closed asadrabbi closed 5 years ago

asadrabbi commented 5 years ago

how to get the URL/Page Title of Parent page of crawl link.

suppose, I have a url "www.example.com" and it is the parent but it has a child "www.example.com/pageone.html" and grand child is "www.example.com/pageone/pagetwo.html".

after traversing pagetwo.html how to get the Page Title/URL of other two urls?

zrashwani commented 5 years ago

You can get that information but using getLinks method to return array of Link objects, and traverse the link parent levels as below:

    // initialize crawler
    $links = $crawler->getLinks(); // to get links as objects
    $linkUrl = 'http://myanimelist.net/character/95195/Kaori_Fujimiya'; // level 3 link
    $parentUrl = $links[$linkUrl]->getParentUrl();  // immediate parent (level 2)
    $topParentUrl = $links[$parentUrl]->getParentUrl();  // top parent (level 1)

    echo "Main Link: $linkUrl".PHP_EOL;
    echo "Parent Link: $parentUrl".PHP_EOL;
    echo "Top Parent Link: $topParentUrl".PHP_EOL;

also you can get the title by using $link[YOUR-LINK-URL]->getMetaInfo('title'); in a similar maaner