onyxfish / votersdaily

A project to parse the content of diverse government schedules into a consistent format.
GNU General Public License v3.0
12 stars 3 forks source link

The CSPAN scrapers 'insert_count' field is incorrect. #84

Open onyxfish opened 15 years ago

onyxfish commented 15 years ago

Each scraper is reporting thousands of inserts--I think the value actually being stored is the total number of documents in the database when the scraper completes.

I'm not sure if this is affecting other scrapers. I need to test with a clean database to get a clearer picture.

chaunceyt commented 15 years ago

this is an interesting issue. Are you spawning a process for each scraper? If so, I need to rethink how I calculate the count.

onyxfish commented 15 years ago

For posterity, it was discussed in chat that the scheduler is indeed spawning additional processes. This may make including this data infeasible. Awaiting a determination on this.