ropensci / RSelenium

An R client for Selenium Remote WebDriver
https://docs.ropensci.org/RSelenium
343 stars 81 forks source link

Why does google search page source results crash RSelenium webdriver? #209

Open nbarsch opened 5 years ago

nbarsch commented 5 years ago

Using Linux why does google search results page source for anything crash my browser?

The example below is reproducible to get the error using a linux google compute engine VM:

When I read google href results from any search using the findElements it works just fine, but when I use remDr$getPageSource()[[1]] (to parse using rvest instead) IT CRASHES MY WEBDRIVER ALWAYS.

Reproduce webdriver:

  library(RSelenium)
    library(rvest)
    library(httr)
    library(RSelenium)

    randsleep <- function(low=1,high=2){
      Sys.sleep(sample(seq(low,high,by=0.001),1))
    }

    system("sudo kill -9 $(lsof -t -i:4444)")
    system("sudo kill -9 $(lsof -t -i:4445)")
    eCaps <- list(chromeOptions = list(
      args = c('--headless', '--disable-gpu', '--window-size=1280,800')
    ))
    randsleep()
    rD <- rsDriver(port=4445L, extraCapabilities = eCaps, browser=c("chrome"), chromever = "76.0.3809.68")
    randsleep()
    remDr <- rD$client

Reproduce search:


    searchstring <- "dogs"
    remDr$navigate("https://www.google.com")
    randsleep(4,6)
    googsearch <- remDr$findElement(using='xpath','//input[@name="q"]')
    randsleep(2,3)
    googsearch$sendKeysToElement(list(searchstring))
    randsleep(2,3)
    googsearch$submitElement()
    randsleep(10,15)

WORKS FINE:


    googres <- remDr$findElements(using='xpath','//div[@class="r"]/a')
    reslinks <- sapply(seq_along(googres),FUN=function(x){googres[[x]]$getElementAttribute("href")})

CRASHES THE WEBDRIVER:

    ps1 <- remDr$getPageSource()
    #ERROR:
    Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
juyeongkim commented 5 years ago

Hi @nbarsch, I wasn't able to reproduce your example. I tried it on Ubuntu 18.04 with the same chrome browser and driver version as you (76.0.3809.68).

Have you try using the docker? https://github.com/SeleniumHQ/docker-selenium https://ropensci.github.io/RSelenium/articles/docker.html