mysociety / parlparse

The scraper/parser that produces data for TheyWorkForYou, PublicWhip, etc
Other
61 stars 22 forks source link

Improve parser detection of unhandled content #80

Open struan opened 7 years ago

struan commented 7 years ago

The parser now tracks all the tags it sees as it goes using tag IDs and then compares those to a list of IDs extracted using XPath. If there is a difference between the lists it throws an Exception.

There's also a number of parser improvements in here which were found in the process of making sure that it parsed things correctly:

It also adds a script to make re-parsing easier.

Fixes #54 Fixes #66