Open GoogleCodeExporter opened 9 years ago
I have created a simpler test. In the attached tar, there are 2 very similar
schema consisting of 4 files, A, B, C, D. The only difference is that in
simpleTest1 file C.rng does not <include> D.rng and in simpleTest2 C.rng does
<include> D.rng.
In simpleTest1 the output looks as it should.
In simpleTest2 the output includes two copies of C with A having a <rev
name="C"/> and B having a <ref name="C_2"/>. So, something about the <include>
of D in C causes a duplication of C in the simplified output for B.
Sorry, the ABCs are a little confusing, but at least the test case is smaller.
Original comment by pcrotw...@gmail.com
on 18 Aug 2010 at 8:00
Attachments:
In general the semantics of externalRef in RELAX NG are XML-level inclusion.
It's a bit like an entity ref in XML, and unlike a normal ref in RELAX NG.
Different occurrences of externalRef may result in semantically distinct
patterns (because (a) referenced schemas may contain "free" refs and
externalRefs may occur in distinct grammars and (b) the ns attribute). So at
the moment each occurrence of an externalRef results in a separate parse of the
referenced URI. When you have externalRefs to a single URI and that schema in
turn has multiple externalRefs to a singleURI, this results in large internal
representation (like entity refs in XML). Most of the time in your example is
taken up with XML parsing.
It would be possible to optimize this by
- noticing when an externalRef does not make any outside ref/parentRefs
- caching externalRefs to the same URI made within the same grammar and with
the same ns
This would involve adding suitable methods to
com.thaiopensource.relaxng.parse.Scope.
In the meantime I would suggest wrapping each externalRef in a define, and then
ref that define.
Original comment by jjc.jclark.com
on 24 Aug 2010 at 4:17
Original issue reported on code.google.com by
pcrotw...@gmail.com
on 18 Aug 2010 at 11:40Attachments: