propublica / Capitol-Words

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
BSD 3-Clause "New" or "Revised" License
121 stars 34 forks source link

[crec_parser] Add segment information #111

Closed rappoport closed 6 years ago

rappoport commented 6 years ago

Split content into a list of segments attributed by speaker.

[{
    'id': 'id-CREC-2000-02-01-pt1-PgS221-3-1',
    'speaker': 'Mr. DOMENICI',
    'text': 'BUDGET SCOREKEEPING REPORT. Mr. DOMENICI.  Mr. President, I hereby submit to the Senate the budget s ...',
    'bioGuideId': 'D000407'
}]

The segment ID might be useful if we want to address individual segments in the frontend.