Closed silk-bahamut closed 1 year ago
This isn't an issue with the Cleaner. Your input HTML has a missing "
in the <a href>
attribute, which makes most of the content an attribute value. If you fix that, the clean works:
String html = """
<p>
<a href="http://google.fr">should be removed</a>
<div>not allowed<span>allowed be inside</span></div>
<span style="background-color: #ba372a;">should be kept with style</span>
</p>
""";
Safelist allowStyle = new Safelist()
.addTags("p", "b", "em", "i", "strong", "u", "span", "ul", "ol", "li", "pre", "h1", "h2", "h3", "h4", "h5", "h6")
.addAttributes(":all", "style");
String clean = Jsoup.clean(html, allowStyle);
System.out.println(clean);
Gives:
<p>should be removed</p>not allowed<span>allowed be inside</span> <span style="background-color: #ba372a;">should be kept with style</span>
<p></p>
I would like to clean some html with styling to remove some tags but keep the styling of the text nodes But it seems if the node is of type TextNode the tag and all styling is lost