The parser now tracks all the tags it sees as it goes using tag IDs and then compares those to a list of IDs extracted using XPath. If there is a difference between the lists it throws an Exception.
There's also a number of parser improvements in here which were found in the process of making sure that it parsed things correctly:
Fix for the parser failing to pick up all the text if there is more than one hs_Para element inside a Question tag
Fixes broken table parsing code
Fixes missing some content inside division tags
Correctly handles clause tags to be part of the immediately following Amendment
Makes hs_2cDebatedMotion a major heading
Fixes missing some content inside new debate tags.
The parser now tracks all the tags it sees as it goes using tag IDs and then compares those to a list of IDs extracted using XPath. If there is a difference between the lists it throws an Exception.
There's also a number of parser improvements in here which were found in the process of making sure that it parsed things correctly:
It also adds a script to make re-parsing easier.
Fixes #54 Fixes #66