Closed mvera closed 1 year ago
The problem you are facing is that you are still parsing the input as HTML, which means jsoup will treat tags like script
as HTML specific. Particularly, in script
, entities are not escaped. Changing the output syntax to xml
but leaving the input as html
is intended to create XHTML
output.
If you specify to use the jsoup XML parser, your tests will pass.
org.jsoup.nodes.Document document = Jsoup.parse(htmlDoc, Parser.xmlParser());
As in #1942, it may be simpler to use the W3CDom here.
Jsoup doesn't handle correctly ampersand in scripts when converting to XML. Jsoup doesn't handle correctly
entity when converting to XML. In these cases XML generated is invalid. Find below a junit test which reproduces the bugs. Test is standalone and only requires java. Java 17. Jsoup 1.15.4 Test 1 and 3 fail. The test 2 is OK.