rstudio / chromote

Chrome Remote Interface for R
https://rstudio.github.io/chromote/
156 stars 20 forks source link

Problem with chromote package #154

Open saleforecast1 opened 6 months ago

saleforecast1 commented 6 months ago

Dear community,

I have created a shiny app to scrape the "google title" from the google page ("https://google.com"). To scrape this, I have used R chromote package. The app works fine while running on the desktop. However, once It is hosted on shinyapps.io server, two users can not use it concurrently. The code for the app is in below,

library(shiny)
library(curl)
library(chromote)
library(pagedown)

ui <- fluidPage(
  textOutput("result")
)

server <- function(session, input, output) {
  driver <- ChromoteSession$new()
  driver$Page$navigate("https://google.com") # open Google page
  Sys.sleep(7)

  output$result <- renderText(
    # scrape Google title 
    driver$Runtime$evaluate('document.querySelector("title").innerText')$result$value
  ) 
}

shinyApp(ui = ui, server = server)

Output:

  1. Click on: https://sale4cast.shinyapps.io/findGoogleTitle/
  2. Wait 5 seconds.
  3. Get the google title "Google"

Question: How can two users access the app concurrently via shinyapps.io?.

Best Regards,

SaleForecast

wch commented 6 months ago

I think the problem is the use of Sys.sleep(). That will block the entire R process.

You should do something like this:

library(shiny)
library(chromote)

ui <- fluidPage(
  textOutput("result")
)

server <- function(session, input, output) {
  driver <- ChromoteSession$new()
  p <- driver$Page$loadEventFired(wait_ = FALSE)
  driver$Page$navigate("https://google.com", wait_ = FALSE)

  output$result <- renderText({
    p$then(function(value) {
      # scrape Google title 
      driver$Runtime$evaluate('document.querySelector("title").innerText')$result$value
    })
  }) 
}

shinyApp(ui, server)

To properly navigate to a page and wait for it to load without blocking the R process, see this section of the README: https://github.com/rstudio/chromote?tab=readme-ov-file#taking-a-screenshot-of-a-web-page

The example above also makes uses of Promises in Shiny. See here for more information: https://rstudio.github.io/promises/articles/promises_06_shiny.html

saleforecast1 commented 6 months ago

Thanks for your feedback. Does it work on shinyapps.io?

Best Regards, SaleForecast

gadenbuie commented 6 months ago

@saleforecast1 Your app currently does work on shinyapps.io. Maybe your example doesn't completely reproduce your issue or I don't understand what you mean by "work" or "two users can not use it concurrently". But if I open https://sale4cast.shinyapps.io/findGoogleTitle/ in two different tabs or browsers, they both eventually (after about 7 seconds) show me the word "Google".

aronatkins commented 6 months ago

This question was also cross-posted to https://forum.posit.co/t/problem-with-chromote-package/186346

saleforecast1 commented 6 months ago

If I open this link in two different tabs or browsers simultaneously, I expect it to display the output concurrently. However, in my app, I notice that the second instance starts running only after the first one has completed.

wch commented 6 months ago

Have you tried the code that I provided? The problem is that your Sys.sleep() blocks the entire process.

gadenbuie commented 6 months ago

Oh, in that case, what Winston said is exactly right:

I think the problem is the use of Sys.sleep(). That will block the entire R process.

If you put Sys.sleep(7) in your app, it causes your app to wait 7 seconds. Sys.sleep() blocks R from doing anything until it finishes. If you open a second tab with the app while the first tab is processing, the second tab has to wait for the first user's app to finish loading, and then has to wait 7 more seconds.

Here's a simple diagram outlining the interaction.

sequenceDiagram
    User 1->>+Shiny: Opens app
    User 2-->Shiny: Opens app
    Shiny-->>-User 1: responds after 7s
    activate Shiny
    Note over Shiny: starts user 2 request
    Shiny-->>-User 2: responds after 7+s 

To fix it please follow Winston's guidance:

To properly navigate to a page and wait for it to load without blocking the R process, see this section of the README: rstudio/chromote#taking-a-screenshot-of-a-web-page

The example above also makes uses of Promises in Shiny. See here for more information: rstudio.github.io/promises/articles/promises_06_shiny.html

saleforecast1 commented 6 months ago
library(shiny)
library(curl)
library(chromote)
library(pagedown)

ui <- fluidPage(
  textOutput("result")
)

server <- function(session, input, output) {
  driver <- ChromoteSession$new()
  p <- driver$Page$loadEventFired(wait_ = FALSE)
  driver$Page$navigate("https://google.com", wait_ = FALSE)

  p$then(function(value){
    googleSearchText <- "4 star hotel in barcelona"
    driver$Runtime$evaluate(paste0('document.querySelector("textarea").value = "', googleSearchText,'"'))
    driver$Runtime$evaluate('document.querySelector("input[aria-label=\'Google Search\']").click()')
  })$then(function(value){
    print(driver$Runtime$evaluate('document.querySelector("title").innerText'))
  })
}

shinyApp(ui, server)

@wch can you say please why this code doesn't return the title? It returns an error "TypeError: Cannot read properties of null (reading 'innerText')\n at :1:32"

wch commented 5 months ago

It sounds like the document.querySelector('title') isn't returning anything.

I think the problem is that clicking on the search button causes another page load, and when you grab the <title> in the middle of that page load, it might be happening too early, before there is a <title> element.

I believe that you'll have to wait for another loadEventFired inside of the promise chain.

library(shiny)
library(chromote)

ui <- fluidPage(
  textOutput("result")
)

server <- function(session, input, output) {
  driver <- ChromoteSession$new()
  p <- driver$Page$loadEventFired(wait_ = FALSE)
  driver$Page$navigate("https://google.com", wait_ = FALSE)

  p$then(function(value){
    googleSearchText <- "4 star hotel in barcelona"
    p2 <- driver$Page$loadEventFired(wait_ = FALSE)
    driver$Runtime$evaluate(paste0('document.querySelector("textarea").value = "', googleSearchText,'"'))
    driver$Runtime$evaluate('document.querySelector("input[aria-label=\'Google Search\']").click()')
    p2
  })$then(function(value){
    v <- driver$Runtime$evaluate('document.querySelector("title").innerText')
    print(v)
  })
}

shinyApp(ui, server)

Note that p2 is created inside the first $then() function, and then it is returned from that function. The way that promises work, this means that the next function that's chained with $then() will wait until that promise resolves before it runs. See the docs for the promises package for more information on how promises work. The API is very similar to JavaScript promises.

One other thing I want to mention: the code you started with uses a mix of sync and async programming, and calls to synchronous Chromote functions inside of asynchronous functions. It works in this case but might do unexpected things for more complicated code. It's probably best to stick to just async code for complex tasks, but that will require a good understanding of how these promises work.

saleforecast1 commented 5 months ago

Thanks for you response @wch. I really appreciate your answer and it works great. However, I still face an error when I run this app from multiple devices by shinyapps.io.

Error: "Unhandled promise error: Chromote: timed out waiting for response to command Page.disable" "Unhandled promise error: Chromote: timed out waiting for event Page.loadEventFired"

Code:

library(shiny)
library(curl)
library(chromote)
library(pagedown)

ui <- fluidPage(
  tableOutput("result")
)

server <- function(session, input, output) {
  driver <- ChromoteSession$new()

  p <- driver$Page$loadEventFired(wait_ = FALSE)
  driver$Page$navigate("https://google.com", wait_ = FALSE)

  p$then(function(value){
    googleSearchText <- "4 star hotel in barcelona"
    p2 <- driver$Page$loadEventFired(wait_ = FALSE)

    driver$Runtime$evaluate(paste0('document.querySelector("textarea").value = "', googleSearchText,'"'))
    driver$Runtime$evaluate('document.querySelector("input[aria-label=\'Google Search\']").click()')
    p2
  })$then(function(value){
    p3 <- driver$Page$loadEventFired(wait_ = FALSE)
    driver$Runtime$evaluate('document.querySelector("div.R2w7Jd").click()')
    driver$Runtime$evaluate('document.querySelector("div.JWXKNd").click()')
    p3
  })$then(function(value){
    priceElement <- driver$Runtime$evaluate(
      'var elements = document.querySelectorAll(".K1smNd > c-wiz[jsrenderer=\'hAbFdb\'] .PwV1Ac");
                 var elementPrices = [];
                 elements.forEach(function(element) {
                   elementPrices.push(element.innerText);
                 });
                 elementPrices.join("@");'
    )
    print(priceElement)
  })
}

shinyApp(ui, server)

can you please help me for sort out the problem?

wch commented 5 months ago

I don't know for sure, but my guess would be that there's not enough time between the two click() commands in the block with p3.

saleforecast1 commented 5 months ago

Thanks for your response @wch. can you please say, how to ensure enough time between two click() event?