simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.57k stars 70 forks source link

Interactive screenshots revert to original page state before saving #125

Closed mhalle closed 8 months ago

mhalle commented 8 months ago

The --interactive flag allows the user to interact with the page before saving. However, it seems that shot-scraper reloads the URL after enter is pressed, reverting the page back to its original state. This limitation or defect reduces the value of --interactive.

For example, load https://maps.google.com/ with --interactive. Drag the map around, then hit enter. The original state of the map is saved.

This is true for any controls as well. For example, try https://mui.com/material-ui/react-slider/ and drag some sliders around in interactive mode, then save. The sliders are in their original default state.

If the user changes the URL interactively to a new page, the old URL is reloaded and saved.

simonw commented 8 months ago

Huh... yeah I just recreated that myself using shot-scraper 'https://maps.google.com/' --interactive - how weird, that's certainly not intended!

Here's what's supposed to happen:

https://github.com/simonw/shot-scraper/blob/6d340ad95174fb7e0416655e50bfd5ecbb1cf8b9/shot_scraper/cli.py#L285-L317

Note that in interactive mode it opens up the browser and then waits for the user to hit <enter> (that input() line)... but then still calls the take_shot() function. BUT it does call that with use_existing_page=True.

And here's where that takes effect:

https://github.com/simonw/shot-scraper/blob/6d340ad95174fb7e0416655e50bfd5ecbb1cf8b9/shot_scraper/cli.py#L952-L977

Reading the code it looks to me like it should be doing the right thing - taking the screenshot based on the state of the page after the user has interacted with it, rather than creating a new page.

Needs more digging.

simonw commented 8 months ago

Spotted it! Further down that take_shot() function:

https://github.com/simonw/shot-scraper/blob/6d340ad95174fb7e0416655e50bfd5ecbb1cf8b9/shot_scraper/cli.py#L989-L1002

That response = page.goto(url) line is throwing away our previous state. That shouldn't happen if use_existing_page is set.

simonw commented 8 months ago

Tested manually and this fixes the issue.