radkovo / jStyleParser

jStyleParser is a CSS parser written in Java. It has its own application interface that is designed to allow an efficient CSS processing in Java and mapping the values to the Java data types. It parses CSS 2.1 style sheets into structures that can be efficiently assigned to DOM elements. It is intended be the primary CSS parser for the CSSBox library. While handling errors, it is user agent conforming according to the CSS specification.
http://cssbox.sourceforge.net/jstyleparser/
GNU Lesser General Public License v3.0
92 stars 49 forks source link

<br> Tag is not handled properly by stylesToDomInherited() method #25

Closed tetsuo-repo closed 9 years ago

tetsuo-repo commented 9 years ago

Html containing <br> tags are modified as you would expect it for every other html tag.

<br>

is modified to

<br style=""></br>

Browsers treat <br> (always) like a line break. The resulting document thus contains two line breaks.

radkovo commented 9 years ago

The stylesToDomInherited() method operates on the DOM. Your problem seems to be related to the later serialization of the DOM back to the HTML code. What do you use for serialization? You should probably use an HTML-aware serialization instead of XML serialization.

tetsuo-repo commented 9 years ago

My code currently looks like this:

        ByteArrayOutputStream os = new ByteArrayOutputStream();
        try {
            DocumentSource docSource = new StreamDocumentSource(html, null, "text/html");

            DOMSource parser = new DefaultDOMSource(docSource);
            Document doc = parser.parse();
            DOMAnalyzer da = new DOMAnalyzer(doc, null);

            /** Read additional report style definitions from file */
            da.addStyleSheet(null, IOUtils.toString(ReportingService.class.getResourceAsStream("/report-styles.css"), "UTF-8"),
                    Origin.AUTHOR);
            da.getStyleSheets();
            da.stylesToDomInherited();
            Output out = new NormalOutput(doc);
            out.dumpTo(os);
            docSource.close();
            String interim = new String(os.toByteArray()).replaceAll("<br style.*?</br>", "<br />");
            return new ByteArrayInputStream(interim.getBytes());
        } catch (IOException e) {
            LOG.error(String.format("Error in '%s'. Message is '%s'", "fixReportImages", e.getMessage()), e);
        } catch (SAXException e) {
            LOG.error(String.format("Error in '%s'. Message is '%s'", "fixReportImages", e.getMessage()), e);
        } finally {
            try {
                os.close();
            } catch (IOException e) {
                LOG.error(String.format("Error in '%s'. Message is '%s'", "fixReportImages", e.getMessage()), e);
            }
        }

Is the "NormalOutput" from org.fit.cssbox.css what you mean?

radkovo commented 9 years ago

Yes, the NormalOutput class is just a simple demo not intended for production use. If you are using the default NekoHTML parser for decoding the source douments, you could probably use its HTML serializer or some general-purpose DOM serializer if available.

It seems that you use the CSSBox API rather than pure jStyleParser so this discussion is probably more relevant to CSSBox project.

tetsuo-repo commented 9 years ago

Perfect. Thank you for your time & your hints.