ncbi / DtdAnalyzer

Other
34 stars 11 forks source link

Add content-model summary with overall quantifiers #6

Open Klortho opened 12 years ago

Klortho commented 12 years ago

Problem

Within the content model, we want to know, for each possible child element, "whether they are required, and how many are allowed (0 or 1, exactly 1, 1 or more, 0 or more)." For element-type content models, this is not completely trivial.

Algorithm

Let the content model be ( a, (b|c), a+, (d|a) )

"a" appears three times in this model. Let's call the first leaf node of "a", {a1}, the second, {a2}, and the third, {a3}. Then, in the first part of the algorithm, we find that the the implicit quantifiers are:

Now sum these up, to get: {a}: min=2, max=∞

In other words, in this content model, <a> must occur at least twice, and can occur an unbounded number of times.

Format

To record these results, we don't want to put the quantifiers on the nodes of the existing output, because those nodes occur more than once, and in a hierarchical structure. Instead, I'd suggest adding another section to the output, that looks like this:

<content-model spec="element" minified="(a,(b|c),a+,(d|a))"
               spaced="( a, ( b | c ), a+, ( d | a ) )">
  <seq>
    <child>a</child>
    <choice>
      <child>b</child>
      <child>c</child>
    </choice>
    <child q="+">a</child>
    <choice>
      <child>d</child>
      <child>a</child>
    </choice>
  </seq>
  <!-- This section provides a flattened list of all children, with summary quantifiers -->
  <children>
    <child min='2' max='inf'>a</child>
    <child min='0' max='1'>b</child>
    <child min='0' max='1'>c</child>
    <child min='0' max='1'>d</child>
  </children>
</content-model>