relaxng / jing-trang

Schema validation and conversion based on RELAX NG
http://www.thaiopensource.com/relaxng/
Other
229 stars 69 forks source link

Element declarations appear twice in the DTD output #181

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
While fixing issue 180 
(https://code.google.com/p/jing-trang/issues/detail?id=180) I encountered a 
problem in the conversion to DTD. The following schema 

<?xml version="1.0" encoding="UTF-8"?>
<grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="test">
      <interleave>
        <zeroOrMore>
          <element name="a"><text/></element>
        </zeroOrMore>
        <ref name="test"></ref>
      </interleave>
    </element>
  </start>
  <define name="test">
    <zeroOrMore>
      <choice>
        <text/>
        <element name="x">
          <text/>
        </element>
      </choice>
    </zeroOrMore>
  </define>
</grammar>

will result in this invalid DTD:

<?xml encoding="UTF-8"?>

<!ELEMENT test (#PCDATA|x|a)*>
<!ATTLIST test
  xmlns CDATA #FIXED ''>

<!ELEMENT x (#PCDATA)>
<!ATTLIST x
  xmlns CDATA #FIXED ''>

<!ELEMENT a (#PCDATA)>
<!ATTLIST a
  xmlns CDATA #FIXED ''>

<!ENTITY % test "(#PCDATA|x)*">

<!ELEMENT x (#PCDATA)>
<!ATTLIST x
  xmlns CDATA #FIXED ''>

where the x element is declared twice.

It seems that the problem is related to the fact that ref will get us to visit 
another pattern and the element declarations are added twice to the 
elementsQueue and then output - there is some code that tries to avoid this but 
that works only if the reference has a single element, see the 
ExpandedContentModelOutput class:

  class ExpandedContentModelOutput extends ContentModelOutput {
    public VoidValue visitElement(ElementPattern p) {
      p.getNameClass().accept(this);
      return VoidValue.VOID;
    }
  }

this overwrites the visitElement to not add the element pattern to the 
elementsQueue, here it is the overwritten method:

    public VoidValue visitElement(ElementPattern p) {
      p.getNameClass().accept(this);
      elementQueue.add(p);
      return VoidValue.VOID;
    }

and as mentioned that works if there is only one element in the referred 
pattern, if we have mixed content or more than one element then we get those 
patterns twice.

Original issue reported on code.google.com by georgebina76 on 31 Mar 2014 at 12:46