Closed chriscastille6 closed 7 years ago
we invite you to try the last release or Rcrawler v 0.1.3 (just uploaded on cran)
Rcrawler(Website = "http://www.example.com/", KeywordsFilter = c("keyword1", "keyword2"))`
Crawl the website and collect only webpages containing keyword1 or keyword2 or both.
Rcrawler(Website = "http://www.example.com/", KeywordsFilter = c("keyword1", "keyword2"),
KeywordsAccuracy = 50)
Crawl the website and collect only webpages that has an accuracy percentage higher than 50% of matching keyword1 and keyword2. You can use one or more search terms, the accuracy will be calculated based on how many keywords are on the page plus their occurrence.
waiting your review
I'd like to apply Rcrawler to various major news outlets (e.g., BBS, NBC, FOX, etc.) but only scrape articles that are relevant to my topic (e.g., the Volkswagen emissions scandal). Is it possible for me to do this with Rcrawler?