ropensci / RSelenium

An R client for Selenium Remote WebDriver
https://docs.ropensci.org/RSelenium
343 stars 81 forks source link

headless chrome queryRD #211

Open Danny-dK opened 5 years ago

Danny-dK commented 5 years ago

Operating System

Windows 10

Selenium Server version (selenium-server-standalone-3.0.1.jar etc.)

4.0.0-alpha-2

Browser version (firefox 50.1.0, chrome 54.0.2840.100 (64-bit) etc.)

Chrome 76.0.3809.132

Other driver version (chromedriver 2.27, geckodriver v0.11.1, iedriver x64_3.0.0, PhantomJS 2.1.1 etc.)

Chromedrivers: "2.44" "2.45" "70.0.3538.97" "71.0.3578.137" "71.0.3578.33" "71.0.3578.80" "72.0.3626.69" "72.0.3626.7"
"73.0.3683.20" "73.0.3683.68" "74.0.3729.6" "75.0.3770.140" "75.0.3770.8" "75.0.3770.90" "76.0.3809.126" "76.0.3809.25" "76.0.3809.68" "77.0.3865.10" "77.0.3865.40"

Expected behaviour

To download in headless chrome in location of own choosing using queryRD as specified in issue #138

ecap <- list(chromeOptions = list(args = c('--headless', '--disable-gpu')))

library(RSelenium)
rD1 <- rsDriver(browser = "chrome", port=4567L, geckover=NULL, chromever="74.0.3729.6", iedrver=NULL, phantomver=NULL, extraCapabilities = ecap)
remDr1 <- rD1[["client"]]  

remDr1$queryRD(ipAddr = paste0(remDr1$serverURL, "/session/", remDr1$sessionInfo[["id"]], "/chromium/send_command"),
              method = "POST", qdata = list(cmd = "Page.setDownloadBehavior", params = list(behavior = "allow",
              downloadPath = "D:\\")))

Actual behaviour

Stopping with message:

Selenium message:POST /session/4b92e3b3ba1fd29adbb164186884158a/chromium/send_command
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: '[host]', ip: '[address]', os.name: 'Windows 10', os.arch: 'x86', os.version: '10.0', java.version: '1.8.0_221'
Driver info: driver.version: unknown

Error:   Summary: UnknownCommand
     Detail: The requested resource could not be found, or a request was received using an HTTP method that is not supported by the mapped resource.
     Further Details: run errorDetails method

Steps to reproduce the behaviour

I had a script working with the queryRD part to get headless chrome downloading in a location of my choosing. I had not used the script for some time. This week I did and it failed. The first error was using the wrong chromedriver version. This tends to happen, so I looked up available chromedrivers with binman::list_versions("chromedriver"). I found the latest compatible for my chrome version 76 being 76.0.3809.126. I could open a headless chrome, only now when i ran the queryRD part:

remDr1$queryRD(ipAddr = paste0(remDr1$serverURL, "/session/", remDr1$sessionInfo[["id"]], "/chromium/send_command"), method = "POST", qdata = list(cmd = "Page.setDownloadBehavior", params = list(behavior = "allow", downloadPath = "D:\\")))

It resulted in an error with a not so helpful message: Unknown Command. After looking all over for possible solutions, out of desperation I tried all chromedriver versions that I had available. Turns out, certain drivers do run without problems in headless chrome with the queryRD specification, and others do not (they do however run without the queryRD). My chrome version is 76, but the queryRD part would not work unless I used one of chromedrivers "2.44", "70.0.3538.97", "71.0.3578.33", "71.0.3578.80", "71.0.3578.137", "72.0.3626.69", "72.0.3626.7". The 73 to 76 driver versions would not recognize the queryRD part but they would allow me to start a headless chrome if I did not select and run the queryRD (but then a headless chrome does not download).

Is there a different command being implemented in newer versions of chromedrivers for the queryRD part to work? (NOTE again that it is just the queryRD part that does not seem to work as all compatible drivers can run in headless without the queryRD part).

mlamias commented 4 years ago

I have this same problem.

QizhiSu commented 3 years ago

Same problem, any update?

Danny-dK commented 3 years ago

@QizhiSu No update. Considering that RSelenium github looks rather dead (no activity from the devs for quite some time), I'm assuming there won't be any further help. By now I also believe that it is a Google Chrome issue in which they may have switched options in their configurations. In any case you can still use it without headless chrome and you can specify window size (to make it as small as you would like).

ecap <- list(chromeOptions = list(prefs = list(
                                            "profile.default_content_settings.popups" = 0L,
                                            "download.prompt_for_download" = FALSE,
                                            "download.default_directory" = path),
                             args = c('--window-size=600,800', '--disable-gpu')))

There probably also is an option to make it start minimized (but I don't know).

QizhiSu commented 3 years ago

@Danny-dK Thanks a lot for your information. By the way, I failed to change the download directory using your code (change the path to getwd()). The code can be run, but no download has been executed.

Danny-dK commented 3 years ago

@QizhiSu Assuming you're on Windows. Chrome options section doesn't recognize the path resulting from getwd() as this results in a forward slash path (but is directed to chrome on Windows and is expecting backward slash path). This works for me where I have a folder called 'Required_files' in my work directory:

# Set Path to use for ecap object where all forward slashes are replaced with backward slashes. Leave out 
# the second line if you don't have another folder to download to in the work directory (and of course remove the pipe
# operator).

path <- gsub('/', '\\\\', getwd()) %>% 

            paste0(., '\\Required_files')

# Specify extra capabilities object to run chrome.

ecap <- list(chromeOptions = list(prefs = list(
                                            "profile.default_content_settings.popups" = 0L,
                                            "download.prompt_for_download" = FALSE,
                                            "download.default_directory" = path),
                             args = c('--window-size=600,800', '--disable-gpu')))
QizhiSu commented 3 years ago

Get it! Huge thanks @Danny-dK