onyxfish / votersdaily

A project to parse the content of diverse government schedules into a consistent format.
GNU General Public License v3.0
12 stars 3 forks source link

PHP scrapers should implement 'insert_count' in vd_logs. #44

Closed chaunceyt closed 15 years ago

chaunceyt commented 15 years ago

(Note: retitled for clarity)

I'm starting to work on putting data in vd_logs.

I noticed I have access to this: scripts/legislative/house_roll_call_votes/scraper.php --eventdb=cthorn_tmp

string(118) "{"db_name":"cthorn_tmp","doc_count":100,"doc_del_count":0,"update_seq":100,"compact_running":false,"disk_size":213761}"

vd_logs.results = above result for each scrape? a very low doc_count could indicate an issue with the scraper..

onyxfish commented 15 years ago

I've added an 'insert_count' field to vd_logs. The documentation (wiki) has been updated and the new field implemented in the Python scrapers.

onyxfish commented 15 years ago

Because different language libraries are going to provide slightly different feedback from CouchDB I'm not in favor of storing that JSON row in the results field. I think adding the 'insert_count' field gives us the same value and can easily be implemented without access to the raw return values from CouchDB.

chaunceyt commented 15 years ago

I'm going to decode the json and insert the values and not the json data.

onyxfish commented 15 years ago

Oh, ok, that makes sense then. Never can have too much data. :-)

chaunceyt commented 15 years ago

I just noticed the doc_count is available on a GET and not on a PUT. Interesting... if the python couchdb api handles this differently I would like to review how they're handling it.

chaunceyt commented 15 years ago

fixed closing

onyxfish commented 15 years ago

I actually don't see doc_count either way through the Python library--I'm counting the records before I insert them and using that value (since I do duplicate checking in advance this should be valid).

I have seen a number of places online where people discuss some differences in how CouchDB treats GET, PUT, and POST requests. Apparently there are (or at least have been) some inconsistencies.