Open hrbrmstr opened 7 years ago
(keeping running notes here)
Did a bit more investigation and I think the R DSL makes the most sense.
Am likely going to wrap https://github.com/dhbaird/easywsclient and see if I can't bang out something half-usable in short-order.
Basic tests with wscat
shows it's super-easy to create the proper JSON DevTools websocket function calls that return immediate responses/JSON values. For the core "data gathering" tasks that would be the primary purpose of this pkg in R, such functionality is pretty straightforward.
With proper "headless chrome" being "a thing" now — https://developers.google.com/web/updates/2017/04/headless-chrome — Chrome 59+ on anyone's system can be either instrumented at the cmdline or via the devtools protocol. Note that:
is on the linked web page so I'm expecting the chrome team to provide direct "webdriver" support or a higher-level JS API like
phantomjs
has.Enabling individual R users to "just use" their own instance of Chrome removes obstacles like Docker (tho this is a gd image https://github.com/ebidel/lighthouse-ci/blob/master/builder/Dockerfile) or virtual machines from the equation, so I'm unlikely to go down that route. I'm also not keen on building a version of chrome with "R" in it or R hooks in it since that means One More Thing to download.
Once/if webdriver support is added, this pkg might be moot. There's no guarantee for webdriver support tho.
Shorter-term goals are:
Longer-term goal is:
Depending on how much time I have (or if others want to pile on!) getting the Chrome DevTools protocol working for instrumentation is a goal. It looks event-oriented and may mean dealing with C[++] or C-wrapped R callbacks OR making an R orchestration DSL that translates into DevTools protocol "commands" and then just getting the result.
I personally only care about getting content back out, so unless someone who cares more about detailed instrumentation for creating — say — a test framework for
htmlwidgets
jumps on, I'm solely focused on enabling easier JS-based web-scraping (like I did with thesplashr
pkg).