internetarchive / wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.
GNU Affero General Public License v3.0
666 stars 207 forks source link

Autosave prevents saving page with outlinks #941

Open mtae opened 2 years ago

mtae commented 2 years ago

Describe the bug When the autosave feature is enabled, it's not possible to save a page with outlinks or as a snapshot manually anymore.

To Reproduce Steps to reproduce the behavior:

  1. Enable the autosave feature.
  2. Go to a URL that hasn't been saved in the archive that isn't in the exclude list for your auto save.
  3. Select outlinks in the extension popup
  4. Click save page now
  5. See error "The same snapshot had been made ... ago. You can make new capture of this url after 45 minutes.

Expected behavior Since the webpage has been recently archived, just the outlinks should be archived excluding the original url.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

cgorringe commented 2 years ago

Thank you for this report and observation.

I think maybe the way to solve this would be to allow Auto Save to also save Outlinks when it is checked, but not sure since turning this on uses a lot of resources and takes a while to complete, which may prevent additional Auto Saves from taking place until prior saves complete. It's a tricky balance to consider.

mtae commented 2 years ago

Thanks for your response! I was wondering along those lines as well but as you say it could be difficult performance wise.

Would it also have performance issues if you check whether each of the outlinks have been recently saved? What does the extension check against when returning the "this webpage has been saved x minutes ago, try again in 45 minutes" message? Is this against a wayback machine API or checking a local database used by the extension?

cgorringe commented 2 years ago

The server API itself returns the "45 minutes" message and places the limit to prevent resaving, and the API also processes the list of Outlinks rather than doing it in the extension, so trying to save just the Outlinks again would be a change in how we process them, since right now the API handles it all in the background.