salimk / Rcrawler

An R web crawler and scraper
http://www.sciencedirect.com/science/article/pii/S2352711017300110
Other
350 stars 92 forks source link

Rcrawler aborts when crawling and scraping #52

Open jkrss opened 5 years ago

jkrss commented 5 years ago

Hi,

I would like to crawl and scrape the content of a whole website. This is the code:

Rcrawler(Website = URL, no_cores = 4, no_conn = 4, ExtractXpathPat = c("//./div[@class='bodytext']//p", "//./h1[@class='blogtitle']", "//./div[@id='kommentare']//p"), PatternsNames = c("article", "title", "comments"), ManyPerPattern = TRUE)

After retrieving approx. 19% of the data I get the following error message:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0

It always happens at the same point, DATA and INDEX are created correctly with all entries crawled until the error message.

Am I doing something wrong or is it something with the website I would like to crawl? I am using Rcrawler version 0.1.9-1.

Thanks for helping me out!