propublica / Capitol-Words

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
BSD 3-Clause "New" or "Revised" License
122 stars 34 forks source link

Django-admin enabled version #105

Open drinks opened 7 years ago

drinks commented 7 years ago

Long long ago, I started a rewrite of this app with a celery-backed ingest process and Django 1.5 (yep that's about how long ago it was). I really have no recollection of what state of completion things were in, or what the code quality was, but at a minimum the ingest and parse processes were integrated. I'm sure significant work would still need to be done, but given that the code needs to be overhauled regardless, is there any interest if I can track it down? I think the plan was to move from the custom solr extensions to ES, though the API wasn't able to support everything at that time. With the addition of significant terms aggregations, a lot of what cap words does can probably be ported to on-the-fly calculations out of the search backend. Lemme know!