salimk / Rcrawler

An R web crawler and scraper
http://www.sciencedirect.com/science/article/pii/S2352711017300110
Other
350 stars 92 forks source link

Rcrawler() and LinkExtractor() do not collect 'external urls' from HTML>footer #69

Open graskaas2014 opened 5 years ago

graskaas2014 commented 5 years ago

Sometimes when using LinkExtractor function doesnot collect external URLS from the footer of a webpage while the HTML is visible. Bug or wrong use?

For instance: urls<-LinkExtractor("https://www.partou.nl", ExternalLInks = TRUE, Useragent = "Chrome/41.0.2228.0"))

or

urls<-LinkExtractor("https://www.kinderopvangoosterhout.nl", ExternalLInks = TRUE, Useragent = "Chrome/41.0.2228.0"))

ps. awesome package, makes scraping and parsing so much easier :)