Closed chaunceyt closed 15 years ago
I've added an 'insert_count' field to vd_logs. The documentation (wiki) has been updated and the new field implemented in the Python scrapers.
Because different language libraries are going to provide slightly different feedback from CouchDB I'm not in favor of storing that JSON row in the results field. I think adding the 'insert_count' field gives us the same value and can easily be implemented without access to the raw return values from CouchDB.
I'm going to decode the json and insert the values and not the json data.
Oh, ok, that makes sense then. Never can have too much data. :-)
I just noticed the doc_count is available on a GET and not on a PUT. Interesting... if the python couchdb api handles this differently I would like to review how they're handling it.
fixed closing
I actually don't see doc_count either way through the Python library--I'm counting the records before I insert them and using that value (since I do duplicate checking in advance this should be valid).
I have seen a number of places online where people discuss some differences in how CouchDB treats GET, PUT, and POST requests. Apparently there are (or at least have been) some inconsistencies.
(Note: retitled for clarity)
I'm starting to work on putting data in vd_logs.
I noticed I have access to this: scripts/legislative/house_roll_call_votes/scraper.php --eventdb=cthorn_tmp
string(118) "{"db_name":"cthorn_tmp","doc_count":100,"doc_del_count":0,"update_seq":100,"compact_running":false,"disk_size":213761}"
vd_logs.results = above result for each scrape? a very low doc_count could indicate an issue with the scraper..