Closed KnutJaegersberg closed 6 years ago
Actually, we are working on this feature, but we have implemented R webdriver package/ phantomjs webdriver you will be notified when released. Thank you
Rcrawler v0.1.9 is released with a lot of features, subscribe to our mailing list to stay updated http://eepurl.com/dMv_7s
I found that by simply changing a single line in the linkextractor.R from readhtml to renderhtml from the splashr package, one can apparently crawl javascript enforcing sites, too.
Especially interesting is the combo with this docker image, making tor crawls optional too: https://github.com/TeamHG-Memex/aquarium
Would be nice to see this as optional in a future version. Or even better, mixing the framework with the interactivity options provided by Rselenium, but that would mean larger changes I guess. Anyway, this is as far I can see the most advanced scrapy competitor out there in the R language, would be nice to see it grow as well. Much better than rharvest.