Closed s-crypt closed 2 years ago
This is a good point, particularly while using search engines and the ?utm_source
or that one that facebook uses.
The ClearURL extension project actually has a list of regex matches to detect and make clean URLs. https://docs.clearurls.xyz/latest/specs/rules/ https://rules2.clearurls.xyz/data.minify.json
@s-crypt Thanks for the info!
@s-crypt Really appreciate the suggestion and info on ClearURLs! I've created a new feature request at #722 - Thanks!
Adding some notes on terminology for future reference:
"whitelist to exclude auto saving sites" actually sounds like the opposite. Maybe we can call this an "Exclude URL Patterns" list of regex's, or how about "Filtered URL Patterns".
Sorry, I mixed up blacklist and whitelist when I originally wrote this
I WANT THIS!!!
Forgot I already posted on this, but just coming back again to see if this feature has been implemented (guess not) and to reiterate that I really want this, for things like gmail and facebook especially.
Finally implemented this in PR #845 if anyone wants to test it out. I should probably add some more URL patterns to the Default List so I welcome any suggestions!
Is there a way to have an "Include List"? That is, only save a URL if it matches the user-defined list?
Hi @contextnerror, although possible, I'd think it wouldn't be as necessary since you can always explicitly tap Save Page Now.
We used to have a Bulk Save feature where you could import your bookmarks. This could be an alternative as a bookmark list is like an "include list", although it would take some extra steps. The Bulk Save feature has been temporarily removed but we plan to bring it back in the future.
Hi @contextnerror, although possible, I'd think it wouldn't be as necessary since you can always explicitly tap Save Page Now.
We used to have a Bulk Save feature where you could import your bookmarks. This could be an alternative as a bookmark list is like an "include list", although it would take some extra steps. The Bulk Save feature has been temporarily removed but we plan to bring it back in the future.
While I get that "Save Page Now" is useful to quickly archive a webpage, it's still a bit of chore to do that for every link that hasn't been archived so I think adding an "Include List" as an option would be extremely helpful especially when it's used together "Auto Save Page".
To explain my point further, I believe the way "Auto Save Page" works right now is rather inconvenient because if you turned it on, you'd have to go and add all kind of urls to your "Exclude List" to stop private webpages that'd be pointless or even harmful to have it saved on the Wayback Machine. On other hand, a whitelist would be much more logical since it would require minimal action from the user and there would be very low risk of spamming the Wayback Machine compared to a blacklist.
For example, I've been using the extension for a while to make sure Youtube videos I've watched are archived on Wayback but I have no choice but to do it manually for every link that hasn't been saved before, but with a whitelist option I could just include youtube.com in the whitelist and check the "Auto Save Page & if not archived: previously" option and be done with it. This is currently impossible to automate unfortunately.
Is your feature request related to a problem/ issue? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
Additional context
Edit: Some things that should be blacklisted:
search?
,?s
,s?
,?searchterm
,?search
,?q
,?query
,search.cfm
,search?q
,result?
,?st
,search.aspx?
,*/search/*
,*.amazon.com/s/
,search#
,?f
,?t
- used for search terms, can still be manually archived?utm_source
- used for attribution. Remove anything after to get just the pure URL#
- used for document locations