javaee / jaxb-v2

Other
212 stars 101 forks source link

Different marshallization behaviour with OutputStream and Writer #985

Open glassfishrobot opened 10 years ago

glassfishrobot commented 10 years ago

We discovered a difference in the behaviour of the marshaller when using a Writer and when using an OutputStream.

In our case, we needed to marshall (formatted):

texttextbaz

This worked when using an OutputStream, but new lines were inserted when using a Writer.

On investigation, I noticed that in com.sun.xml.bind.v2.runtime.MarshallerImpl has two distinct methods called createWriter(..) - one taking a Writer, and one taking an OutputStream. The version taking an OutputStream is doing something special in case the encoding is UTF-8 (Why??). The difference is in the usage of IndentingUTF8XmlOutput, which correctly implements the desired functionality - whenever there's text content before the element, no new line+indentation is appended.

However, OutputStream+UTF-8 is the only case *UTF8XmlOutput classes are used, otherwise they aren't. The indentation behaviour might be just one example of behaviour difference, so I would suggest to make the behaviour consistent (i.e. - work for both OutputStream and Writer implementations, and preferably for all encodings)

And while at it, in that class there is one check if (encoding.equals("UTF-8"))

{..} and one if (encoding.startsWith("UTF")) {..}

. This is pretty bad, as it doesn't work with lower-case utf-8 or without a dash, and also would yield different results for these combinations - for UTF8 one if-clause will work, the other won't. If different behaviour based on encoding is really needed (I would say it shouldn't be), then please use normalization/canonicalization of the string - via Charset.forName(..) for example.

Affected Versions

[2.2.7]

glassfishrobot commented 10 years ago

Reported by glam

glassfishrobot commented 10 years ago

yaroska said: I know about the difference. But why 'new lines' is the case?

glassfishrobot commented 10 years ago

glam said: Well, in our case we needed not to have new lines. The behaviour implemented by IndendtingUTF8XmlOutput, and not implemented elsewhere.

For example, we have a unit-test that fails because of that - we process the document multiple times, sometimes by writers, sometimes by output streams, and the outputs don't match.

glassfishrobot commented 10 years ago

Was assigned to yaroska

glassfishrobot commented 7 years ago

This issue was imported from java.net JIRA JAXB-985