mysociety / parlparse

The scraper/parser that produces data for TheyWorkForYou, PublicWhip, etc
Other
61 stars 22 forks source link

Heading within speech triggers new debate #171

Open ajparsons opened 11 months ago

ajparsons commented 11 months ago

Have an example of a heading within a speech being promoted to a new section:

https://www.theyworkforyou.com/debates/?id=2023-06-14b.312.1 https://www.theyworkforyou.com/debates/?id=2023-06-14b.312.3

vs

https://hansard.parliament.uk/Commons/2023-06-14/debates/B3E06371-9436-4705-B39B-06D4D7860761/CostOfLivingAndBrexit?highlight=%22to%20be%20called%20the%20cost%20of%20living%20committee%22#

Becomes a major heading in the XML, but not sure if this is the original feed misbehaving, or if we're assuming headers can't be in speeches.

https://www.theyworkforyou.com/pwdata/scrapedxml/debates/debates2023-06-14b.xml

ajparsons commented 11 months ago

Related to: https://github.com/mysociety/parlparse/issues/53