Closed robliao closed 3 years ago
What's the use case?
I'm working on making the Polymer platform (Link) workable with the Google Closure Compiler with the Polymer Renamer (Link).
Polymer's template element provides a way to repeat sections of HTML (Link).
<template is="dom-repeat" items="{{employees}}">
<div># <span>{{index}}</span></div>
<div>First name: <span>{{item.first}}</span></div>
<div>Last name: <span>{{item.last}}</span></div>
</template>
This may fall within a table element.
<table>
<template is="dom-repeat" items="{{employees}}">
<tr>
<td># <span>{{index}}</span></td>
<td>First name: <span>{{item.first}}</span></td>
<td>Last name: <span>{{item.last}}</span></td>
</tr>
</template>
</table>
If this template falls within a table element, Jsoup reparents that to the element containing the table, breaking the template. We'd like to keep this template element where it is.
Any update to this? Because we need data node semantics for script and style nodes, there's no available workaround we can use.
I don't have an update. I understand the use case, thanks.
Hi, we are also needing this feature in our project. We are using Polymer template tag to iterate data in an array inside the table element. Any chance the option to disable validation be implemented in November?
If the option to disable validation is not feasible asap, can the element be allowed inside the table?
+1
Alternatively, you might be able to use the XmlTreeBuilder if you could skip the contents of opaque HTML nodes like script tags.
+1 My (admittedly borderline-invalid) table markup is also getting mangled by this parser.
Can't you use the XML parser if you don't want HTML? Per @martijneken's point.
@jhy : Critical to @martijneken's point is the ability to skip opaque HTML tags like script. This was also pointed out in my original post:
I should also note that I would like to keep the semantics of HTML Parsing (e.g. data nodes like script element contents are not parsed). This requirement prevents me from using the XmlTreeBuilder.
Without this, XmlTreeBuilder will simply parse the contents of the elements, which is especially undesirable if HTML tags exist as strings between script tags.
Gotcha, thanks. Sorry, should have read the full report again.
Has anyone in this thread worked around this issue? I'll proceed down the road to subclassing the default HTML tree builder, but if there's a simpler approach, even if hacky, I'm very happy to take the lazy way out!
@gar1t Did you or anyone else come up with any solution?
Will close this -- the html tokeniser state and the html tree builder are pretty tightly coupled, due to the nature of the HTML5 spec, so IMV it's not a feasible change -- and I haven't seen PRs for it either.
I think the right solution for the presented use case is to add support for template elements, per the spec.
I'm way late to the party here, but I recently wrote a test to validate some HTML by ensuring no elements were reparented by jsoup. In the process I came up with a solution to this:
package org.jsoup.parser;
public class NoFosterInsertsHtmlTreeBuilder extends HtmlTreeBuilder {
@Override
protected boolean process(Token token) {
setFosterInserts(false);
return super.process(token);
}
@Override
boolean process(Token token, HtmlTreeBuilderState state) {
setFosterInserts(false);
return super.process(token, state);
}
}
HtmlTreeBuilder with HtmlTreeBuilderState performs validation of the tree as it is getting parsed. For example, it restricts the elements available in a table element opting to reparent those into a foster parent when encountered (Source Link).
Is it within the bounds of jsoup to provide an option to parse the HTML as is without this reparenting feature?
I should also note that I would like to keep the semantics of HTML Parsing (e.g. data nodes like script element contents are not parsed). This requirement prevents me from using the XmlTreeBuilder.