propublica / Capitol-Words

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
BSD 3-Clause "New" or "Revised" License
121 stars 34 forks source link

All daily_counts are 0 #122

Closed mcm852 closed 5 years ago

mcm852 commented 5 years ago

All daily_counts are getting set to 0 when running the parser. Checking http://localhost:8000/cwapi/term_counts_by_day/?term=russia&start_date=2017-03-01&end_date=2017-03-30 demonstrates shows this.

I run the processor as instructed in the README: $ /manage.py run_crec_parser --start_date=2016-01-20 --end_date=2016-01-21

And get this:

Processing files for 2016-01-20 00:00:00. Found 62 new records. WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '. ']

Has anyone else seen this query error?

This is definitely what's causing every all daily_counts to be 0, and in turn causing the entire thing not to work. It's getting thrown by FuzzyWuzzy, the fuzzy logic module being used here.

UPDATE: So it turns out my query string was wrong and I didn't have the right data scraped. Whoops. Anyway, everything seems to be working as expected. I will continue to look into the warning though and see if there's something that ought to be added to the queries to satisfy the warning message. Closing this issue.