neogeny / TatSu

竜 TatSu generates Python parsers from grammars in a variation of EBNF
https://tatsu.readthedocs.io/
Other
408 stars 48 forks source link

usage example: grab all capitals in xml #165

Closed nicolasessisbreton closed 3 years ago

nicolasessisbreton commented 4 years ago

What would be the approach to grab all the capitals in the xml below?

<world>

  <country>
    <name>us</name>

    <city>
      <name>washington</name>
      <capital>yes</capital>
    </city>

    <city>
      <name>new york</name>
      <capital>no</capital>
    </city>

  </country>

  ... many more countries ...

</world>

Here are some attempts.

approach #1

approach #2

are there other ways?

This example is simplified, the condition on the element to grab could be arbitrarily complex. (think XPath) XPath is nice but has the same problem then approach #1 (fragile). Handling the task with a grammar gives a maintainable set of complex extractors. (define only the bit needing extraction, handling of massive xml,...)

Victorious3 commented 3 years ago

I'd use an existing xml parser for this. What's wrong with using XPath? It seems to be the right tool for the job. Besides, questions like this belong on stackoverflow with the [tatsu] tag.

With XPath it should be something like //city[.//capital[.="yes"]]