Closed moxious closed 10 years ago
how did you construct your Other value?
Luc
On 04/24/2014 01:44 PM, moxious wrote:
I've been seeing a lot of exceptions of this form.
It would appear there's an invalid XML character somewhere, possibly
, <, or &. This exception though makes it difficult to track down exactly where it is, or what can be done about it. Because I'm using the model classes and ProvFactory, it also seems this shouldn't be possible; in the event that I put a value in that isn't valid XML (e.g. an "Other" whose value is something like "x > y") then shouldn't the process of serializing this as XML (as opposed to other formats) handle the encoding specifics of XML for me, and know to replace > with >?
So it would be helpful to wrap such exceptions with contextual information about what exactly failed to marshal, and perhaps do more checking on valid values when the model is being created (prior to marshalling) to avoid this coming up.
Exception in thread "main" javax.xml.bind.MarshalException
- with linked exception: [com.sun.istack.SAXException2: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. ] at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:323) at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249) at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:116) at org.openprovenance.prov.xml.ProvSerialiser.serialiseDocument(ProvSerialiser.java:121) at org.mitre.provenance.plusobject.prov.PROVConverter.asXMLString(PROVConverter.java:429) at org.mitre.provenance.test.TestPROV.checkPROV(TestPROV.java:39) at org.mitre.provenance.test.TestPROV.dumpPROV(TestPROV.java:33) at org.mitre.provenance.test.TestPROV.main(TestPROV.java:45)
— Reply to this email directly or view it on GitHub https://github.com/lucmoreau/ProvToolbox/issues/89.Web Bug from https://github.com/notifications/beacon/142275__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzk2MjY5OCwiZGF0YSI6eyJpZCI6MzA3MDY1ODV9fQ==--c6eb507d19d4e907279678d044065e7cca4e7c66.gif
Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
I have many of them done in several different ways; but often I'm using a method like this for testing:
/** * Make a single object property into an "Other" statement. * @param name property name * @param value property value * @param prefix ns prefix * @return */ protected Other makeObjectProperty(String name, Object value, String prefix) { QualifiedName nameType = this.name.XSD_STRING; if(value instanceof Date) { nameType = this.name.XSD_DATETIME; // Needs to be valid XML date format. SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); value = sdf.format((Date)value); } else if(value instanceof Integer) { nameType = this.name.XSD_INTEGER; } else if(value instanceof Double) { nameType = this.name.XSD_DOUBLE; } else { value = ""+value; // Coerce to string } return factory.newOther(BASE_NAMESPACE, name, prefix, value, nameType); } // End makeObjectProperty
This makeObjectProperty method is getting called repetitively as the graph nodes I have contain metadata items in a HashMap<String,Object> - keys end up as names, values as values to this method. But to be clear: I am not even sure that this method of created Other objects is what's causing the problem - the exception doesn't really provide any insight. So part of the issue is first narrowing down where this is even happening.
Have you got an example of value you pass to makeObjectProperty and that makes jaxb serialization raise an exception?
On 04/24/2014 02:12 PM, moxious wrote:
I have many of them done in several different ways; but often I'm using a method like this for testing:
/** * Make a single object property into an "Other" statement. * @param name property name * @param value property value * @param prefix ns prefix * @return */ protected Other makeObjectProperty(String name, Object value, String prefix) { QualifiedName nameType = this.name.XSD_STRING; if(value instanceof Date) { nameType = this.name.XSD_DATETIME; // Needs to be valid XML date format. SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); value = sdf.format((Date)value); } else if(value instanceof Integer) { nameType = this.name.XSD_INTEGER; } else if(value instanceof Double) { nameType = this.name.XSD_DOUBLE; } else { value = ""+value; // Coerce to string } return factory.newOther(BASE_NAMESPACE, name, prefix, value, nameType); } // End makeObjectProperty
— Reply to this email directly or view it on GitHub https://github.com/lucmoreau/ProvToolbox/issues/89#issuecomment-41277759.Web Bug from https://github.com/notifications/beacon/142275__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzk2NDMzNiwiZGF0YSI6eyJpZCI6MzA3MDY1ODV9fQ==--b0b26b08244a37910263ad45c07408cf1efda8eb.gif
Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
I am not sure that it is makeObjectProperty() that is doing this. That is part of the problem: the exception provides no hints at what is invalid. This method (makeObjectProperty()) is just my guess, but I have inspected it output, and attempted to replace invalid characters, to no effect.
Yes, I have examples but they are not simple source files I can provide. They are larger workflows within a repository we manage.
As the stack traces shows, the problem occurs in JAXB serialization. I am unclear of what can be done in ProvToolbox: suggestions welcome!
There are many ways of failing serialization: QNames/QualifiedNames may not be syntactically correct. The Object value in TypedValue is expected to be syntactically valid for xml serialization. To satisfying a validating serializer, the value should be compative with the type ...
http://openprovenance.org/java/site/0_5_0/apidocs/org/openprovenance/prov/model/TypedValue.html
Luc
On 04/24/2014 02:37 PM, moxious wrote:
This makeObjectProperty method is getting called repetitively as the graph nodes I have contain metadata items in a HashMap - keys end up as names, values as values to this method. But to be clear: I am not even sure that this method of created Other objects is what's causing the problem - the exception doesn't really provide any insight. So part of the issue is first narrowing down where this is even happening.
— Reply to this email directly or view it on GitHub https://github.com/lucmoreau/ProvToolbox/issues/89#issuecomment-41280524.Web Bug from https://github.com/notifications/beacon/142275__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzk2NTg3MywiZGF0YSI6eyJpZCI6MzA3MDY1ODV9fQ==--ab7a96e939415f835839330e78fcdd138bd3e0e9.gif
Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
I'm not familiar with how the serialization proceeds, I can't offer recommendations there.
However, when you first create the model instances (such as QualifiedNames) there should be some checking at that point. If a developer creates a QualifiedName object with invalid characters in them, it should either fix those characters (if possible) or throw an exception because the QualifiedName instance itself is invalid. If exceptions were thrown when invalid data was first provided, this would greatly simplify; it pushes the error from serialization time to model creation time. Because it's at model creation time, the developer would know precisely what they did wrong.
public void test() { System.out.println("Why is it OK for me to do this?"); QualifiedName qn = new org.openprovenance.prov.xml.QualifiedName("!!##", "**!@#!@", "(!^$(!^#)$(!#^)"); System.out.println(qn); }
This code actually works -- which I think is surprising. If I then use this qn
in some other model class, it won't be until serialization time that I find there's a problem. And with the exception stack above, I won't be able to find the problem.
So I think my recommendation would be to do input validity checking in the model classes, to prevent it being possible to create such errors. In the case of this QualifiedName above, there's nothing to be done, maybe an exception is appropriate. In other cases, (like an element whose text node contains <, >, &, some simple processing and substitution is appropriate (for the XML serialization)
Hi,
Yes, some more work is required on QualifiedNames, to support the various serializations.
On 24/04/14 19:53, moxious wrote:
I'm not familiar with how the serialization proceeds, I can't offer recommendations there.
However, when you first create the model instances (such as QualifiedNames) there should be some checking at that point. If a developer creates a QualifiedName object with invalid characters in them, it should either fix those characters (if possible) or throw an exception because the QualifiedName instance itself is invalid. If exceptions were thrown when invalid data was first provided, this would greatly simplify; it pushes the error from serialization time to model creation time. Because it's at model creation time, the developer would know precisely what they did wrong.
public void test() { System.out.println("Why is it OK for me to do this?"); QualifiedName qn = new org.openprovenance.prov.xml.QualifiedName("!!##", "**!@#!@", "(!^$(!^#)$(!#^)"); System.out.println(qn); }
This code actually works -- which I think is surprising. If I then use this |qn| in some other model class, it won't be until serialization time that I find there's a problem. And with the exception stack above, I won't be able to find the problem.
So I think my recommendation would be to do input validity checking in the model classes, to prevent it being possible to create such errors. In the case of this QualifiedName above, there's nothing to be done, maybe an exception is appropriate. In other cases, (like an element whose text node contains <, >, &, some simple processing and substitution is appropriate (for the XML serialization)
— Reply to this email directly or view it on GitHub https://github.com/lucmoreau/ProvToolbox/issues/89#issuecomment-41317948.Web Bug from https://github.com/notifications/beacon/142275__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzk4NDgwNCwiZGF0YSI6eyJpZCI6MzA3MDY1ODV9fQ==--1eb80f4ee7bd9847a01c38c91d9a56f60b39770d.gif
I've tracked down the source of the problem over a number of hours with a debugger. It turns out the issue is related to creating QualifiedName objects where the local part contained a space in it. That space was then causing the invalid character problem.
In the XML ProvFactory, it creates QualifiedNames via a pass-through method that just calls the QualifiedName constructor. That constructor does no input checking at all. There is precedent in other implementations of QName for constructors to throw InvalidArgumentException
. The spec for QName parts includes the constraint that namespace URI is an anyURI
and the local part must be an NCName
. Checking those spec constraints for XML and throwing an InvalidArgumentException
if the constructor is given bad data (which can never be validly serialized) would be an improvement.
Often I have complex code driving generation of documents that have dozens or hundreds of items in it, and it is very labor intensive to track down the source of the problem when the implementation is permitting behavior that isn't to spec.
Hi,
Actually, the QName javadoc documentation explicitly states:
/The Namespace URI is //not//validated as a URI reference. The local part is //not//validated as a NCName as specified in Namespaces in XML./
I would be reluctant to add such tests as part of the constructor. However, I agree that it would be useful to have a function that checks whether a QualifiedName has got a valid syntax.
Luc
On 28/04/2014 14:25, moxious wrote:
I've tracked down the source of the problem over a number of hours with a debugger. It turns out the issue is related to creating QualifiedName objects where the local part contained a space in it. That space was then causing the invalid character problem.
In the XML ProvFactory, it creates QualifiedNames via a pass-through method that just calls the QualifiedName constructor. That constructor does no input checking at all. There is precedent in other implementations of QName http://docs.oracle.com/javase/7/docs/api/javax/xml/namespace/QName.html for constructors to throw |InvalidArgumentException|. The spec for QName parts http://www.w3.org/TR/xmlschema-2/#QName includes the constraint that namespace URI is an |anyURI| and the local part must be an |NCName|. Checking those spec constraints for XML and throwing an |InvalidArgumentException| if the constructor is given bad data (which can never be validly serialized) would be an improvement.
Often I have complex code driving generation of documents that have dozens or hundreds of items in it, and it is very labor intensive to track down the source of the problem when the implementation is permitting behavior that isn't to spec.
— Reply to this email directly or view it on GitHub https://github.com/lucmoreau/ProvToolbox/issues/89#issuecomment-41557275.Web Bug from https://github.com/notifications/beacon/142275__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxNDMxMDcyNCwiZGF0YSI6eyJpZCI6MzA3MDY1ODV9fQ==--8e9e7088cdda1bb6a1f6f3261083396817f034e5.gif
Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Yes, the QName implementation doesn't do this checking either, which is strange, considering that other related standard java classes like URL & URI throw exceptions out of their constructors. Doubly strange in that QName already throws an Exception on null input. I don't know why this is but the docs also say blank prefixes are permitted to maintain backwards compatibility, so that may be part of the issue.
But I'm curious, why the reluctance? When would it be acceptable, in the XML serialization library, for a QualifiedName to contain a character that can't be serialized to XML?
I've written my own implementation, subclassing QualifiedName. It can be found here: https://github.com/plus-provenance/plus/blob/master/src/main/java/org/mitre/provenance/plusobject/prov/VerifiedXMLQualifiedName.java
I've been seeing a lot of exceptions of this form.
It would appear there's an invalid XML character somewhere, possibly >, <, or &. This exception though makes it difficult to track down exactly where it is, or what can be done about it. Because I'm using the model classes and ProvFactory, it also seems this shouldn't be possible; in the event that I put a value in that isn't valid XML (e.g. an "Other" whose value is something like "x > y") then shouldn't the process of serializing this as XML (as opposed to other formats) handle the encoding specifics of XML for me, and know to replace > with >?
So it would be helpful to wrap such exceptions with contextual information about what exactly failed to marshal, and perhaps do more checking on valid values when the model is being created (prior to marshalling) to avoid this coming up.