internetarchive / wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.
GNU Affero General Public License v3.0
667 stars 207 forks source link

Upload Page Now Feature #836

Open cgorringe opened 2 years ago

cgorringe commented 2 years ago

The Problem

The idea for this feature stems from noticing that the Wayback Machine can't always save pages due to measures that websites take to block web crawlers, but which the user is able to view.

Take for example if one were to read Bloomberg News and wanted to archive a page that they were reading. As you can see in the following screenshots, after running Save Page Now that the Wayback Machine didn't actually save the correct page!

Original Wayback Machine
Link to Bloomberg Article. Link to Wayback Machine, which failed to obtain a copy but instead reports a HTTP 307 Temporary Redirect.
bloomberg1 bloomberg2

The Solution - Upload Page Now

What I propose is for the webextension to save the page that is currently viewed in the browser, then upload that page to the Wayback Machine.

Since this would be a user-submitted upload subject to manipulation, we could first call SPN so that WM can initially download the website, send an id back to the webextension, which it then uses to call a new API referencing that id while uploading a local website copy. WM's website could provide a link to the uploaded version to complement the official SPN version, perhaps similar to how it stores snapshot images.

Chrome provides a function that will save an entire page of HTML with images and pack it up into a single MHTML file, which can then be uploaded using some additional code.

chrome.pageCapture.saveAsMHTML() https://developer.chrome.com/docs/extensions/reference/pageCapture/

Other browsers currently do not support this or similar function, as far as I can tell, and it isn't even documented on MDM.

There's currently a bug report suggesting that implementing this function in Firefox isn't a high priority.

Each browser seems to have their own standards for saving whole websites:

Related Links:

cgorringe commented 2 years ago

Related:

cgorringe commented 1 year ago

More Related: