ropensci / RSelenium

An R client for Selenium Remote WebDriver
https://docs.ropensci.org/RSelenium
343 stars 81 forks source link

Possible to use RSelenium within ShinyApp? #231

Open JackLich10 opened 4 years ago

JackLich10 commented 4 years ago

I have an Rscript that uses RSelenium to collect data (I cannot use rvest because the webpage uses javascript). I included this script as a function in my shinyApp (I perform the scraping interactively after some user input in the shinyApp). When I run it from my machine (locally) everything works as expected. The trouble occurs when it is published on the RStudio server (shinyapps.io). Basically, it cannot connect to the local port.

[1] "Connecting to remote server" 2020-03-14T23:14:25.090285+00:00 shinyapps[1626004]: Warning: Error in checkError: Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused

I am not the most familiar with RSelenium, Docker, etc. but in general, is it possible to run RSelenium based scripts within shinyApp from RStudio server (assumning it works in shinyApp locally)? If so, how do I do so (I think I have to set up Docker to allow for other port access maybe)?

Here is some example code to provide more information. After downloading Docker and running docker run -d -p 4445:4444 selenium/standalone-chrome inside terminal, the following should work on your local machine (it does on mine): ` library(shiny) library(RSelenium)

ui <- fluidPage( actionButton("go", "Scrape") ) server <- function(input, output, session) { observeEvent(input$go, { remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "chrome")

Open browser session

remDr$open()

remDr$navigate("https://www.google.com/") # or any website
# Click on links and scrape some stuff
# Close browser session
remDr$close()

}) }

shinyApp(ui, server) ` The problem is that this does not work when you publish to shinyapps.io. It must have something to do with the way I set up Docker, remoteServerAddr = "localhost", the port, etc., but I do not know how to solve this so that users can interact with my shinyApp to scrape websites that use javascript. Thanks for any help!

helgasoft commented 4 years ago

Are you expecting to be able to open a port on shinyapps.io ? It is not your server, therefore the obvious error 'Connection refused'.

JackLich10 commented 4 years ago

Thank you for the response! Do you think there is a way around this? Basically, do you have any ideas on how to scrape a javascript webpage where one must click on links and dropdown menus using either RSelenium or another package, with the same goal that I would like users to be able to interact with my shinyApp to scrape the website? For example, can I set up a special server for shinyapps.io to access? Any help is greatly appreciated!

helgasoft commented 4 years ago

Yes, the solution is to maintain/run your own RSelenium server and connect to it from inside you Shiny app on shinyapps.io. So instead of having

remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "chrome")

the Shiny app command will be

remDr <- RSelenium::remoteDriver(remoteServerAddr = "<remote IP>", port = 4445L)

where <remote IP> would be the IP address of your RSelenium server machine. This machine should have port 4445 opened thru the firewall. Run your Docker command on it to get started. You may also use commercial (paid) Selenium servers - check this article.

fverkroost commented 1 year ago

@JackLich10 Three years later, but given that this issue is still open I was wondering if you ever managed to get it to work (and if so, how)? I'm facing the same issue today.