ERR_HTTP2_PROTOCOL_ERROR when trying to navigate to a website

nclsbarreto commented 3 months ago

I am trying to learn to use chromote and am generally doing pretty well. But I have run into an issue with this website.

library(chromote)
url <- "https://health.usnews.com/best-hospitals/search"

tab <- ChromoteSession$new()

tab$Page$navigate("https://www.google.com")

tab$Page$navigate(url)

There is no problem navigating to google, but when I try to navigate to usnews i get "$errorText [1] "net::ERR_HTTP2_PROTOCOL_ERROR""

any help would be appreciated.

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RPostgreSQL_0.7-5 tmap_3.3-4        odbc_1.4.2        logger_0.3.0      DBI_1.2.2         glue_1.7.0        httr2_1.0.0       jsonlite_1.8.4    xml2_1.3.3       
[10] chromote_0.2.0    openxlsx_4.2.5.2  dbplyr_2.4.0      rvest_1.0.4       lubridate_1.9.2   forcats_1.0.0     stringr_1.5.1     dplyr_1.1.2       purrr_1.0.1      
[19] readr_2.1.5       tidyr_1.3.0       tibble_3.2.1      ggplot2_3.5.0     tidyverse_2.0.0   pacman_0.5.1     

loaded via a namespace (and not attached):
 [1] sf_1.0-12           bit64_4.0.5         RColorBrewer_1.1-3  httr_1.4.7          tools_4.1.1         utf8_1.2.4          R6_2.5.1            KernSmooth_2.23-20 
 [9] colorspace_2.1-0    raster_3.6-20       withr_3.0.1         sp_1.6-0            tidyselect_1.2.1    processx_3.8.1      leaflet_2.2.1       curl_5.2.1         
[17] bit_4.0.5           compiler_4.1.1      leafem_0.2.3        cli_3.6.3           scales_1.3.0        classInt_0.4-9      proxy_0.4-27        rappdirs_0.3.3     
[25] digest_0.6.31       base64enc_0.1-3     dichromat_2.0-0.1   pkgconfig_2.0.3     htmltools_0.5.7     fastmap_1.1.1       htmlwidgets_1.6.4   rlang_1.1.4        
[33] rstudioapi_0.15.0   generics_0.1.3      crosstalk_1.2.1     zip_2.3.0           magrittr_2.0.3      Rcpp_1.0.10         munsell_0.5.0       fansi_1.0.6        
[41] abind_1.4-5         lifecycle_1.0.4     terra_1.7-23        stringi_1.7.12      leafsync_0.1.0      tmaptools_3.1-1     grid_4.1.1          blob_1.2.4         
[49] parallel_4.1.1      promises_1.2.1      lattice_0.20-44     stars_0.6-4         hms_1.1.3           ps_1.7.5            pillar_1.9.0        codetools_0.2-18   
[57] XML_3.99-0.14       BiocManager_1.30.22 vctrs_0.6.5         png_0.1-8           tzdb_0.4.0          gtable_0.3.4        lwgeom_0.2-11       e1071_1.7-13       
[65] later_1.3.2         class_7.3-19        viridisLite_0.4.2   websocket_1.4.1     units_0.8-1         timechange_0.2.0

gadenbuie commented 2 months ago

I'm pretty certain that the website you're trying to open looks at the User Agent string in the request and is seeing "Headless Chrome" in that field and is then blocking access. Clearly they are trying to discourage web scraping efforts.

nclsbarreto commented 2 months ago

Fantastic. That is what I had concluded as well, but I'm not exactly a pro (particularly at HTML) so I wanted to confirm. Thank you.

rstudio / chromote

ERR_HTTP2_PROTOCOL_ERROR when trying to navigate to a website #166