unitedstates / congressional-record

A parser for the Congressional Record.
Other
128 stars 40 forks source link

Speakers with white spaces in their names #20

Closed nclarkjudd closed 9 years ago

nclarkjudd commented 9 years ago

I believe this change to the regex for re_newspeaker and re_speaking resolves issue #18 without breaking anything else (in a way noticeable by the test suite at least).

These changes add a group for one or zero whitespaces to the end of the group of characters that are allowed to repeat to identify speaker names in re_newspeaker. The re_newspeaker and re_speaking expressions also were not the same, which would seem to mean that you might get a new speaking tag without a new speaker tag for that speaker in some instances. Except for the treatment of groups that differs across re_newspeaker and re_speaking, the two expressions are now the same.

This should mean more consistent handling of names.

I also added a test to the test suite for identifying speakers with white spaces in their names (e.g. Debbie Wasserman Schultz, Sheila Jackson Lee ...).