simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.67k stars 73 forks source link

`--user-agent` option for sending different user agent #63

Closed simonw closed 2 years ago

simonw commented 2 years ago

The Google SSO web interface doesn't work with the default Playwright Chromium, as described (and worked around) in:

CleanShot 2022-04-08 at 12 36 48@2x

I did an experiment and it turns out this is just down to the user agent string. So I'm going to add a --user-agent option to the auth command and various others.

https://playwright.dev/python/docs/emulation#user-agent shows how to do this in Playwright:

context = browser.new_context(
    user_agent='My user agent'
)
simonw commented 2 years ago

I'm going to include some shortcut options for popular browsers - so you can do --user-agent chrome to get Mozilla/5.0 (Macintosh; Intel Mac OS X 12_3_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36 for example (from https://www.whatismybrowser.com/guides/the-latest-user-agent/chrome)

simonw commented 2 years ago

I decided not to ship shortcuts for user agents, because I don't want to have to maintain that list myself.

simonw commented 2 years ago

Manual testing:

shot-scraper 'https://www.whatismybrowser.com/detect/what-is-my-user-agent/' \
  --user-agent "Hello there" --height 400

www-whatismybrowser-com-detect-what-is-my-user-agent

echo '- url: https://www.whatismybrowser.com/detect/what-is-my-user-agent/\n  height: 400' | \
  shot-scraper multi - --user-agent "Hello multi"

www-whatismybrowser-com-detect-what-is-my-user-agent

And for shot-scraper javascript:

% shot-scraper javascript https://www.example.com/ 'navigator.userAgent'
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4863.0 Safari/537.36"
shot-scraper % shot-scraper javascript https://www.example.com/ 'navigator.userAgent' --user-agent "hello"
"hello"