mysociety / za-hansard

A parser for South African Hansards, as published at http://www.parliament.gov.za/live/content.php?Category_ID=119
Other
2 stars 3 forks source link

Issues/pombola/1111 parsing issues speaker and speeches #28

Closed osfameron closed 10 years ago

osfameron commented 10 years ago

Fixes #25 and #26, two parsing issues identified by Geoff Kilpin. More information in the respective tickets and commit messages!

NOTE: after fixing #25, we need popit-resolver to do something useful with the name in parenthesis! The PR https://github.com/mysociety/popit-resolver/pull/2 does add this behaviour, but additionally we need to check that sayit import passes the contents of the parenthesis (e.g. rather than just stripping them).

osfameron commented 10 years ago

Re: comment about sayit import, https://github.com/mysociety/sayit/blob/master/speeches/importers/import_base.py#L73 does indeed seem to just pass the display_name, so we should be OK.

evdb commented 10 years ago

:+1: looks sane to me, I guess running it on a larger corpus is really the only way to see if it does what we want.