salimk / Rcrawler

An R web crawler and scraper
http://www.sciencedirect.com/science/article/pii/S2352711017300110
Other
350 stars 92 forks source link

Scraping articles from a news source filtered by a search term #10

Closed chriscastille6 closed 7 years ago

chriscastille6 commented 7 years ago

I'd like to apply Rcrawler to various major news outlets (e.g., BBS, NBC, FOX, etc.) but only scrape articles that are relevant to my topic (e.g., the Volkswagen emissions scandal). Is it possible for me to do this with Rcrawler?

salimk commented 7 years ago

we invite you to try the last release or Rcrawler v 0.1.3 (just uploaded on cran)

Rcrawler(Website = "http://www.example.com/", KeywordsFilter = c("keyword1", "keyword2"))`

Crawl the website and collect only webpages containing keyword1 or keyword2 or both.

  Rcrawler(Website = "http://www.example.com/", KeywordsFilter = c("keyword1", "keyword2"),
 KeywordsAccuracy = 50)

Crawl the website and collect only webpages that has an accuracy percentage higher than 50% of matching keyword1 and keyword2. You can use one or more search terms, the accuracy will be calculated based on how many keywords are on the page plus their occurrence.

waiting your review