tidyverse / rvest

Simple web scraping for R
https://rvest.tidyverse.org
Other
1.49k stars 343 forks source link

LiveHTML $click() like js functionality #431

Open davidrsch opened 2 weeks ago

davidrsch commented 2 weeks ago

Hello for what I have understood $click searchs for the element in the page and then simulate a mouse click. This work nice in some scenarios, but not in all. For example I am trying to scrap a map where clicking in markers display it's corresponding information, but when trying to click in no visible markers nothing happens while when I use js in browser console to simulate the click it does work. So I think it would be nice to have $clik to work as js click function.

hadley commented 2 weeks ago

Can you provide more details?

davidrsch commented 2 weeks ago

I will try my best but I'm not an expert on this matter. For this I have created an example app that can be found here. This app has two button one visible and other hidden, while scrapping with rvest I am capable of extracting this buttons' information but I am unable to interact with the hidden button as seen in the following reprex:

library(rvest)

link <- "https://0192da39-a400-842a-ff9b-e42f3004808e.share.connect.posit.cloud/"
page <- read_html_live(link)
page$view()
page |> html_elements("button")
#> {xml_nodeset (2)}
#> [1] <button class="btn btn-default action-button custom_class shiny-bound-inp ...
#> [2] <button class="btn btn-default action-button custom_class shinyjs-hide sh ...
page |> html_element("button:nth-of-type(1)")
#> {html_node}
#> <button class="btn btn-default action-button custom_class shiny-bound-input" id="button1" type="button">
page |> html_element("button:nth-of-type(2)")
#> {html_node}
#> <button class="btn btn-default action-button custom_class shinyjs-hide shiny-bound-input" id="button2" type="button">

# As it can be seen both button can be access using css selectors
page$click("button:nth-of-type(1)")
# Can check in browser than modal is displayed and then close it
page$click("button:nth-of-type(2)")
#> Error in onRejected(reason): code: -32000
#>   message: Node does not have a layout object
# Can check that error is returned

When using js in browser console I am able to interact with both buttons. button1 button2

For what i have understood this is due to how js methods interacts with DOM, which allows to simualte actions directly in elements. This could also be helpful when trying to simulate events in elements that overlap eachother, which can be troublesome by the current approach implented in $click which simulate mouse movements to generate the click in the position where the element is displayed.

hadley commented 1 week ago

Oh gotcha. I avoided just calling click() directly in json because that from that browser's perspective, that doesn't seem like a human is performing the action. But it seems like it would be useful to offer it as an option/alternative.