Closed Reeniya closed 1 year ago
Can you provide actual test code that shows the behavior (code, what you get, vs what you want). I'm not clear on what parser (HTML or XML), what you're appending, what serializer you're using, etc.
Hi @jhy
I was trying to write a simple test to re-create this scenario. I was not able to exactly re-create this case but found a different behavior when I call append(String html)
Input string = <p><br /> this is a sample text</p>
HTML string output from jsoup xml parser:
<html><head></head><body><p><br /> this is a sample text</p></body></html>
HTML string after append is called:
<div> <p><br> this is a sample text</p> </div>
Test code:
public void TestAppend(){
String inputHTML ="<p><br /> this is a sample text</p>";
Document xmlDoc = Jsoup.parse(inputHTML, Parser.xmlParser());
Document container = Document.createShell("");
int nodesCount = xmlDoc.childNodeSize();
for (int i = 0; i < nodesCount; i++) {
container.body().appendChild(xmlDoc.childNode(0));
}
xmlDoc = container;
xmlDoc.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
xmlDoc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
System.out.println("HTML string output from jsoup xml parser:\n"+xmlDoc.toString());
Element sectionElement = new Element(Tag.valueOf("div"), "");
sectionElement.empty();
sectionElement.append(xmlDoc.outerHtml());
System.out.println("HTML string after append is called: \n"+sectionElement.toString());
}
If we see the output from the xmlParser we can see <br />
is retained as it is.
But as soon as we call sectionElement.append(xmlDoc.outerHtml());
, <br />
gets converted to <br >
If we call append(String html)
multiple times to create subsections it will convert <br>
to <br></br>
. I am not able to re-create this case with a simple test.
How can I ensure that when I call sectionElement.append(xmlDoc.outerHtml());
does not convert <br />
to <br >
?
Is there a reason why append(String html)
converts <br />
to <br >
when it encounters <br />
?
Thanks
I see that when i call
append(String html)
it usesHTMLTreeBuilder
, So I thinkappend(String html)
uses HTML parser to parse the html String provided as input.
No, it uses the same parser as the parser that produced the Element you are appending do. See: https://github.com/jhy/jsoup/blob/da23af85f39df9f0df732029ed26c34764811009/src/main/java/org/jsoup/nodes/Element.java#L732-L737
In your example code, you are creating a new Element outside of a configured parser, and so that defaults to using the HTML parser:
Element sectionElement = new Element(Tag.valueOf("div"), "");
And then you are re-parsing the XML doc as HTML (in the context of the HTML Element sectionElement
). Hence, the self-closing br
tag is emitted as <br>
.
Also, you might like to use the wrap(html) method to wrap a div
or other content around existing content. It works more efficiently and simply than serialising and re-parsing as you're doing now.
Here's an example:
String xml = "<p><br />Text</p>";
Document doc = Jsoup.parse(xml, Parser.xmlParser());
Element p = doc.selectFirst("p");
Element div = p.wrap("<div>");
p.append("<br />");
System.out.println("XML:\n" + doc.outerHtml());
Produces:
XML:
<div><p><br />Text<br /></p></div>
Thank @jhy this resolved the issue what I was facing.
I am uplifting Jsoup version from 1.11.3 to 1.15.3 in my project. I am seeing that there is a difference in the way self closing tags are handled by
public Element append(String html)
(https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Element.java#L732)With 1.11.3 version of jsoup I see that when we have
<br>
tag and i call theappend(String html)
function it replace the<br>
with<br />
But with 1.15.3 version of jsoup I see that when we have
<br>
tag and i call theappend(String html)
function it replace the<br>
with<br></br>
I see that when i call
append(String html)
it usesHTMLTreeBuilder
, So I thinkappend(String html)
uses HTML parser to parse the html String provided as input.I want the append(String html) to return
<br>
tag as<br />
when I uplift the jsoup version to 1.15.3. How can this be achieved? is this a issue with the latest version?Please can someone give me suggestions how I can handle this case in my code. I need to use the
append(String html)
function to append a sub section within a section.For example If my subsection is
<p> this is <br> text</p>
when I callappend(String html)
to append it to the<div>
tag it should return me<div><p> this is <br /> text</p></div>
( Note: this is the behavior in 1.11.3 version of jsoup, but not in 1.15.3)Thank you in advance....