internetarchive / wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.
GNU Affero General Public License v3.0
663 stars 207 forks source link

Blacklist for Auto-Save #709

Closed s-crypt closed 2 years ago

s-crypt commented 3 years ago

Is your feature request related to a problem/ issue? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Edit: Some things that should be blacklisted:

cgorringe commented 3 years ago

This is a good point, particularly while using search engines and the ?utm_source or that one that facebook uses.

s-crypt commented 3 years ago

The ClearURL extension project actually has a list of regex matches to detect and make clean URLs. https://docs.clearurls.xyz/latest/specs/rules/ https://rules2.clearurls.xyz/data.minify.json

cgorringe commented 3 years ago

@s-crypt Thanks for the info!

cgorringe commented 3 years ago

@s-crypt Really appreciate the suggestion and info on ClearURLs! I've created a new feature request at #722 - Thanks!

cgorringe commented 3 years ago

Adding some notes on terminology for future reference:

"whitelist to exclude auto saving sites" actually sounds like the opposite. Maybe we can call this an "Exclude URL Patterns" list of regex's, or how about "Filtered URL Patterns".

s-crypt commented 3 years ago

Sorry, I mixed up blacklist and whitelist when I originally wrote this

SpongebobSquamirez commented 3 years ago

I WANT THIS!!!

SpongebobSquamirez commented 3 years ago

Forgot I already posted on this, but just coming back again to see if this feature has been implemented (guess not) and to reiterate that I really want this, for things like gmail and facebook especially.

cgorringe commented 2 years ago

Finally implemented this in PR #845 if anyone wants to test it out. I should probably add some more URL patterns to the Default List so I welcome any suggestions!

contextnerror commented 2 years ago

Is there a way to have an "Include List"? That is, only save a URL if it matches the user-defined list?

cgorringe commented 2 years ago

Hi @contextnerror, although possible, I'd think it wouldn't be as necessary since you can always explicitly tap Save Page Now.

We used to have a Bulk Save feature where you could import your bookmarks. This could be an alternative as a bookmark list is like an "include list", although it would take some extra steps. The Bulk Save feature has been temporarily removed but we plan to bring it back in the future.

DesertStars commented 1 year ago

Hi @contextnerror, although possible, I'd think it wouldn't be as necessary since you can always explicitly tap Save Page Now.

We used to have a Bulk Save feature where you could import your bookmarks. This could be an alternative as a bookmark list is like an "include list", although it would take some extra steps. The Bulk Save feature has been temporarily removed but we plan to bring it back in the future.

While I get that "Save Page Now" is useful to quickly archive a webpage, it's still a bit of chore to do that for every link that hasn't been archived so I think adding an "Include List" as an option would be extremely helpful especially when it's used together "Auto Save Page".

To explain my point further, I believe the way "Auto Save Page" works right now is rather inconvenient because if you turned it on, you'd have to go and add all kind of urls to your "Exclude List" to stop private webpages that'd be pointless or even harmful to have it saved on the Wayback Machine. On other hand, a whitelist would be much more logical since it would require minimal action from the user and there would be very low risk of spamming the Wayback Machine compared to a blacklist.

For example, I've been using the extension for a while to make sure Youtube videos I've watched are archived on Wayback but I have no choice but to do it manually for every link that hasn't been saved before, but with a whitelist option I could just include youtube.com in the whitelist and check the "Auto Save Page & if not archived: previously" option and be done with it. This is currently impossible to automate unfortunately.