Closed tetsuo-repo closed 9 years ago
The stylesToDomInherited()
method operates on the DOM. Your problem seems to be related to the later serialization of the DOM back to the HTML code. What do you use for serialization? You should probably use an HTML-aware serialization instead of XML serialization.
My code currently looks like this:
ByteArrayOutputStream os = new ByteArrayOutputStream();
try {
DocumentSource docSource = new StreamDocumentSource(html, null, "text/html");
DOMSource parser = new DefaultDOMSource(docSource);
Document doc = parser.parse();
DOMAnalyzer da = new DOMAnalyzer(doc, null);
/** Read additional report style definitions from file */
da.addStyleSheet(null, IOUtils.toString(ReportingService.class.getResourceAsStream("/report-styles.css"), "UTF-8"),
Origin.AUTHOR);
da.getStyleSheets();
da.stylesToDomInherited();
Output out = new NormalOutput(doc);
out.dumpTo(os);
docSource.close();
String interim = new String(os.toByteArray()).replaceAll("<br style.*?</br>", "<br />");
return new ByteArrayInputStream(interim.getBytes());
} catch (IOException e) {
LOG.error(String.format("Error in '%s'. Message is '%s'", "fixReportImages", e.getMessage()), e);
} catch (SAXException e) {
LOG.error(String.format("Error in '%s'. Message is '%s'", "fixReportImages", e.getMessage()), e);
} finally {
try {
os.close();
} catch (IOException e) {
LOG.error(String.format("Error in '%s'. Message is '%s'", "fixReportImages", e.getMessage()), e);
}
}
Is the "NormalOutput" from org.fit.cssbox.css what you mean?
Yes, the NormalOutput
class is just a simple demo not intended for production use. If you are using the default NekoHTML parser for decoding the source douments, you could probably use its HTML serializer or some general-purpose DOM serializer if available.
It seems that you use the CSSBox API rather than pure jStyleParser so this discussion is probably more relevant to CSSBox project.
Perfect. Thank you for your time & your hints.
Html containing <br> tags are modified as you would expect it for every other html tag.
is modified to
Browsers treat <br> (always) like a line break. The resulting document thus contains two line breaks.