Closed adamhalesworth closed 4 years ago
There's a backslash preceding each double quote which results in the "
becoming part of the URL. This means it doesn't have the https
scheme and gets removed. RemovingAttribute
does fire for me.
Aha, for some reason I didn't think that would matter! The HTML is coming in as part of a JSON payload, so I'll add a step to clean it up first before sanitizing.
Thanks for taking the time to respond. Can we keep this open a bit longer while I get it working?
Are you sure you're parsing the JSON correctly? The backslash is an escape character in JSON so it looks like the string might not have been properly decoded. If it has been properly decoded, then there might be a double encoding issue at the other end. The latter would mean the raw JSON has two backslashes before each double quote (href=\\"
etc.).
Sure, we can leave this issue open for a while.
I've managed to get this working successfully and can confirm that RemovingAttribute
now fires as expected. Thanks for pointing me in the right direction, I wouldn't have considered those escape characters to be an issue, but I've learned my lesson 👍
Using the default configuration,
href
attributes are being removed froma
tags:Becomes:
The URL uses a scheme from the default schemes (
https
) and during sanitization,RemovingAttribute
doesn't get fired either, so not sure why this is occurring?