mirrorweb / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1 stars 2 forks source link

fix: sanitise outlook safelinks #76

Open calbon2702 opened 1 year ago

calbon2702 commented 1 year ago

Description

Python regex library used to identify safe links in URL string. String is then split to remove the safe link prefix & suffix. From: https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fassets.publishing.service.gov.uk%2Fgovernment%2Fuploads%2Fsystem%2Fuploads%2Fattachment_data%2Ffile%2F955642%2FFlexible_Working_and_You.pdf&data=05%7C01%7COliver.Tillard101%40mod.gov.uk%7C734bf9940b2e4677d00d08dad83e6cf0%7Cbe7760ed5953484bae95d0a16dfa09e5%7C0%7C1%7C638060059881265012%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8VcZC%2FGr6iJzzn5st9o1yv8DcDHBLPJXDtOaIcognxM%3D&reserved=0

To: https%3A%2F%2Fassets.publishing.service.gov.uk%2Fgovernment%2Fuploads%2Fsystem%2Fuploads%2Fattachment_data%2Ffile%2F955642%2FFlexible_Working_and_You.pdf

Fix works in my pywb-troubleshooting environment. https://github.com/mirrorweb/pywb-troubleshooting

Motivation and Context

TNA have asked for this here: https://mirrorweb.zendesk.com/agent/tickets/5754

Screenshots (if appropriate):

Live: image

pywb-troubleshooting wr: image

pywb-troubleshooting tna: image

Types of changes

Checklist: