Can the browser be shown while shot-scraper is working?

simonw / shot-scraper

A command-line utility for taking automated screenshots of websites

https://shot-scraper.datasette.io

Apache License 2.0

1.72k stars 78 forks source link

Can the browser be shown while shot-scraper is working? #149

Open b-a0 opened 7 months ago

b-a0 commented 7 months ago

It would help with debugging to see the browser window while shot-scraper is working, and perhaps keep it open when an error occurs (or a screenshot is taken).

This would make debugging easier as I could use the exact state that shot-scraper sees. I tried to achieve this by passing --disable-headless-mode as a browser argument, but that did not show the browser window.

Is there another way of viewing the browser window?

I have seen the --interactive and --devtools options for a single screenshot, but they activate the browser window before any javascript is run.

simonw commented 7 months ago

Are you talking about the default command that takes a screenshot or the shot-scraper javascript command that runs JavaScript?

Assuming you mean shot-scraper javascript you're right, that doesn't have an option similar to --interactive or --devtools yet, I wonder if adding those options there would help?

Current options: https://shot-scraper.datasette.io/en/stable/javascript.html#shot-scraper-javascript-help

b-a0 commented 7 months ago

I meant the shot-scraper shot or shot-scraper multi command. I currently try to create the javascript for these commands in my own browser (Firefox) and then use that in the shot-scraper commands to take a screenshot. However, I noticed that it's very hit-and-miss, as a page might be displayed differently for my own browser (e.g. due to cookies, ad-block, default permissions) in comparison to the shot-scraper browser. If the screenshotting commands could show the browser while shot-scraper is running, I would see what shot-scraper "sees" and I thought it would be easier to quickly catch things like a cookie consent banner, an ad that first needs to be dismissed or a login that needs to be performed.

I was hoping this would be achievable by a certain flag, but if this would require development I would say this is not worth it.

nmstoker commented 6 months ago

Hi @b-a0 - I may have misread your intent but I believe you can interact with the browser in the manner you wish if you make use of the shot-scraper auth command which opens a browser window on your computer showing the page you specified.

It's intended to allow completion of login steps but it sounds like the kind of checking you're trying to do would be possible with it also, since it'll show the site in the browser as it appears for shot-scraper and you can then figure out adjustments "live", which you'd later re-use without the auth option.

https://shot-scraper.datasette.io/en/stable/authentication.html

b-a0 commented 6 months ago

Thanks for that helpful suggestion. It works but is not entirely what I am looking for as I cannot specify --javascript or multi with auth, which would allow me to debug automations that require clicks.

In the past I've used selenium to automate my browser from Python and there I could specify a headless = False option. Running my script with that option showed a browser window where all automated steps were visible (e.g. a text field being filled in, a click on a menu, form submission, etc...). If the script crashed I could see quite easily what the reason was (e.g. an unexpected dialog, a greyed out field, etc...).

But I fully understand this tool has a different focus than generic browser automation and that this feature might not be high on the priority list. Thanks for thinking along!

davidbgk commented 6 months ago

I have a similar use-case, being able to set Playwright's debug option from shot-scraper would be a huge time saver. --interactive does help but you cannot check that --javascript or --wait-for are executed as you expect.