unitedstates / congressional-record

A parser for the Congressional Record.
Other
128 stars 40 forks source link

speech tagged as title #36

Open nclarkjudd opened 6 years ago

nclarkjudd commented 6 years ago

The parser mistakenly assigned a speech as a title.

To replicate this behavior, parse CREC-1997-01-28-pt1-PgS771-3.

A bug fix and a new test are both required.

The bug fix should address what is (I think) an issue with the regular expression currently used to find titles.

The test should assert there aren't extensive lower or normally-cased strings in things tagged as titles.