CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete information about the rendered page suitable for further processing. However, it also allows displaying the rendered document.
We are using CSSBox DOM parser for parsing the HTML source, here is the implementation:
try (DocumentSource docSource = new StreamDocumentSource(JAFIOUtils.toInputStream(htmlSource),
null, "text/html;charset=UTF-8")) {
LOGGER.error("Before parse "+htmlSource);
// Parse the input document
DOMSource parser = new DefaultDOMSource(docSource);
Document doc = parser.parse();
LOGGER.error("After parse "+doc.getFirstChild().getTextContent());
}
For example lets consider the input source or htmlSource is <style></style>Test User <test.user@test.com>
After parsing the output will be Test User <test.user@test.com>.
Here the text content which contains email field enclosed with < and > are decoded to < and >, but as per our requirement, the parser should not decode < and > to < and >.
How to retain the text as it is without decoding or encoding text in this case, @radkovo could you please provide the solution for this issue?
Hi @radkovo ,
We are using CSSBox DOM parser for parsing the HTML source, here is the implementation:
For example lets consider the input source or htmlSource is
<style></style>Test User <test.user@test.com>
After parsing the output will beTest User <test.user@test.com>
.Here the text content which contains email field enclosed with
<
and>
are decoded to < and >, but as per our requirement, the parser should not decode<
and>
to < and >.How to retain the text as it is without decoding or encoding text in this case, @radkovo could you please provide the solution for this issue?