Open dbeyer opened 5 years ago
Would be nice, but the standard XML library for Python does not support this AFAIK. There are other libraries that support this, and we use them in tests to ensure that our output files conform to the DTD, but these libraries are not always available, and I wanted to avoid forcing another dependency on users for this nice-to-have feature.
How about: if library that supports DTD checking is available, then check the input XML file for conformance?
This would mean that a file that is accepted for some users would be rejected for other users. That doesn't seem like good behavior.
We could implement a warning, but this would mean that we show a warning to users although everything works as expected.
Furthermore, implementation is not trivial. We would need to package each and every version of the DTDs with BenchExec.
This would mean that a file that is accepted for some users would be rejected for other users.
Well, strictly speaking it is not 'accepted', but BenchExec just does not tell, instead it runs with undefined behavior.
That doesn't seem like good behavior.
Accepting input that is not allowed by definition is called undefined behavior and the most appropriate action is to tell the user or stop in a defined manner. Not sure what 'good behavior' means in this situation. So a warning in case the input is not checked against the contract (and BenchExec proceeds with possibly failing) seems appropriate.
We would need to package each and every version of the DTDs with BenchExec.
Why is that? The XML refers to a publicly available DTD. For offline usage, the same warning might appear.
I labeled it with enhancement
, but some people might argue this is a bug.
I labeled it with low priority
because if it is difficult to implement right now,
it can be left for later.
Accepting input that is not allowed by definition is called undefined behavior
It is not undefined like in C. We take care that input files are upwards compatible (otherwise we would change the major version) and thus take the liberty to parse each file as if it declares the latest version number. There is no ambiguity and no "anything can happen", no potential for errors.
We would need to package each and every version of the DTDs with BenchExec. Why is that? The XML refers to a publicly available DTD.
BenchExec should be usable offline. Making network connections would introduce delays, especially in case the server is not reachable.
I saw a very clumsy error message today on a benchmark definition with non-matching tags.
Does the Python library for XML support checking against DTDs in the meanwhile?
Does not seem so, I cannot find any source that would indicate that it can, and all tutorials for validating DTDs in Python point to non-standard libraries.
This benchmark definition: https://github.com/sosy-lab/sv-comp/blob/svcomp19/benchmark-defs/cpa-seq-validate-violation-witnesses.xml claims to follow the benchmark-definition format 1.4, but uses the tag
<requiredfiles>
which came only with version 1.9. Shouldn't BenchExec check the XML files that it gets as input against the DTD it claims to follow before accepting them as input?