Open dendrit1 opened 1 year ago
Is this a pagedown question on how to pass argument in extra_args
? Or did you try something and it is not working ?
If this is a question on which user agent to set to bypass the restriction on the website you want to reach, I'll bet you'll find answer on the web.
Also if a website has some policy and put in place restriction, you should carefully read the rules of the website to still be authorize to do what you want to do. Usually security measure like prevent headless access are there for a reason.
Anyhow, happy to help with any pagedown bug, for the rest we are not expert in headless chrome usage. pagedown::chrome_print()
is made in the first place to print to PDF document produced by pagedown and is not a special chrome headless tool. Know that you have R packages dedicated to this like chromote and webshot2 which use it.
Hope it helps
@cderv
Yes, I tried something, and it is not working, i.e., I added:
extra_args = c('--disable-gpu', '--user-agent ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"')
The browser is still recognized as headless on this test-page. Do I have to escape something, or is perhaps the syntax wrong?
Anyway, I solved the problem meanwhile using chromote.
But still, I would like to know if and how this could be also done using pagedown::chrome_print()
?
Thanks a lot !
You could try without the space after user-agent=
maybe ?
Hello,
I use R version 4.3.1 and R-Studio 2023.09.0 Build 463 under Windows 10 Enterprise.
The following R-Code downloads a website as PDF:
When executing this code with test- webpage https://www.whatsmybrowser.org/ it returns correctly as browser "You are using Headless Chrome 117."
PROBLEM: Some webpages block out headless browsers, which then gives the error code "HTTP status code: 404".
I know, that "--user-agent" has to be added into extra_args somehow, in order to mimic a normal, i.e. non-headless browser, see also here: https://useragentstring.com/
Could anyone help me how to code this? The goal is, that this test-website does not return anymore "You are using Headless Chrome 117.", but e.g. "You are using Chome 117" ?
Thanks a lot!