It does not seem possible to strip all URLs with Bleach.
For example, the closest we can get to from the docs is...
import bleach
def remove_it(attrs, new=False):
return None
payloads = (
'a <a href="http://example.com/outer">https://example.com/inner</a> b',
"a https://example.com/bare b",
)
for payload in payloads:
print("=====")
result = bleach.linkify(payload, callbacks=[remove_it])
print(result)
result = bleach.clean(payload, protocols=[])
print(result)
However, The result is:
=====
a https://example.com/inner b
a <a>https://example.com/inner</a> b
=====
a https://example.com/bare b
a https://example.com/bare b
While the desired result is simply:
=====
a b
a b
=====
a b
a b
In many situations dealing with User Generated Content, preventing any URLs whatsoever is desirable - even rendered as plaintext. Currently, this must be handled outside of bleach in a separate processing step. Being able to filter this out within bleach is desirable, as the URLs have already been parsed.
It does not seem possible to strip all URLs with Bleach.
For example, the closest we can get to from the docs is...
However, The result is:
While the desired result is simply:
In many situations dealing with User Generated Content, preventing any URLs whatsoever is desirable - even rendered as plaintext. Currently, this must be handled outside of bleach in a separate processing step. Being able to filter this out within bleach is desirable, as the URLs have already been parsed.