javaee / jaxb-v2

Other
211 stars 101 forks source link

JAXB generates invalid XML (includes characters illegal in XML 1.0) #960

Open glassfishrobot opened 11 years ago

glassfishrobot commented 11 years ago

As per the XML spec [1], the following characters are legal in XML 1.0:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

However, JAXB allows other, illegal characters in input strings (e.g. bell character 0x0007, or vertical tab 0x000B), and marshals them into output XML without any errors or warnings.

I know the solution is not to escape them, since they are illegal regardless of whether they are escaped or not (see #226), but the fact that JAXB generates invalid (and unparseable) XML without any sort of error or warning seems wrong to me.

There are a number of workarounds out in the wild [2, 3] that rely on replacing the illegal characters with legal characters (e.g. space 0x0020, or replacement character 0xFFFD). Another option would be to eat the illegal characters and just not write them to the output.

Regardless of the approach, I think it would be a good idea to at least provide an out-of-the-box way for users to ensure the correctness of JAXB-generated XML. Some options:

[1] http://www.w3.org/TR/REC-xml/#NT-Char [2] http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html [3] http://camel.apache.org/jaxb.html#JAXB-IgnoringtheNonXMLCharacter

Affected Versions

[2.2.6]

glassfishrobot commented 11 years ago

Reported by gredler

glassfishrobot commented 11 years ago

snajper said: Yardo, correct me if I'm wrong but we use JAXP for validating what we read/write. Thus, if valid, I think the issue should be filed against JAXP instead?

glassfishrobot commented 7 years ago

trejkaz said: The offending class is com.sun.xml.internal.bind.marshaller.NioEscapeHandler. The package name makes it sound like JAXB has implemented it directly. Perhaps this is the problem, and the marshalling should have been done using an existing library known to produce valid output, instead of reinventing the wheel poorly?

glassfishrobot commented 11 years ago

Was assigned to yaroska

glassfishrobot commented 7 years ago

This issue was imported from java.net JIRA JAXB-960