tidyverse / rvest

Simple web scraping for R
https://rvest.tidyverse.org
Other
1.49k stars 341 forks source link

LiveHTML object corrupted after `$click()` #405

Open mccarthy-m-g opened 6 months ago

mccarthy-m-g commented 6 months ago

Brief description of the problem

I'm attempting to extract data from a paged table, but after using $click() the LiveHTML object becomes "corrupted". This may be an edge case with this specific website, as I was able to use $click() successfully from some other code examples in the Issues for this repo without this problem.

library(rvest)

sess <- read_html_live("https://www.cicic.ca/869/results.canada?search=&sect=2&int=3")
sess$click(".rgPageNext", n_clicks = 1)

sess
#> Error in onRejected(reason) : code: -32000
#>   message: Could not find node with given id

If you run this interactively with sess$view() you can see that the page loads successfully and the click works successfully, but after that sess seems to lose the information in html_elements (judging by the error).

Additional information

I asked on Mastodon and others were able to reproduce this error. I'm highlighting this because my initial thought was that this issue was caused by my browser being out of date due to OS restrictions, but that doesn't seem to be the case.

Software:

hadley commented 6 months ago

Oh I bet this is because it loads a new page, and I have not update the ID of the root node. I think this will be a reasonably simple fix when I’m next working on rvest.