sensiblecodeio / scraperwiki-python

ScraperWiki Python library for scraping and saving data
https://scraperwiki.com
BSD 2-Clause "Simplified" License
159 stars 69 forks source link

Unique key issue when running scraper locally #8

Closed zzolo closed 11 years ago

zzolo commented 12 years ago

Hi. I was trying out the new, awesome Cobalt service and ran into the following issue while testing locally:

Traceback (most recent call last):
  File "scraper.py", line 35, in <module>
    scraperwiki.sqlite.save(unique_keys=['id'], data=item)
  File "/Library/Python/2.7/site-packages/scraperwiki/sqlite.py", line 30, in save
    return dt.insert(data, table_name = table_name)
  File "/Library/Python/2.7/site-packages/dumptruck/dumptruck.py", line 250, in insert
    self.execute(sql, values, commit=False)
  File "/Library/Python/2.7/site-packages/dumptruck/dumptruck.py", line 107, in execute
    self.cursor.execute(sql, *args)
sqlite3.IntegrityError: column id is not unique

I simply ran python scraper.py and only locally it would do this. While running on Cobalt or in the traditional ScraperWiki web interface it ran fine.

For reference, the scraper is: https://box.scraperwiki.com/zzolo/mn-registered-voters

I realize this may be an issue for Dumptruck, but wanted to start here first as it.

zzolo commented 12 years ago

Looking into this more, it seems that scraperwiki.sqlite.save is actually wrapping around dumptruck.insert.

https://github.com/scraperwiki/dumptruck/blob/master/dumptruck/dumptruck.py#L214 https://github.com/scraperwiki/scraperwiki_local/blob/master/scraperwiki/sqlite.py#L24

Unfortunately this doesn't handle UPDATEing, instead of INSERTing if the unique keys are found. This is a bit misleading from the "save" name of the method.

I am still confused on why this would happen locally and not on Cobalt or traditional Scraperwiki. I saw that Cobalt is using the same code that I am.

tlevine commented 12 years ago

scraperwiki_local is currently a bit of a hack in that it mostly wraps dumptruck rather than acting exactly like scraperwiki on scraperwiki. The scraperwiki database is bit complicated, so I didn't feel like figuring out all of its quirks when I first wrote it.

Anyway, the problem is indeed that DumpTruck's insert does an insert while scraperwiki does an insert or replace, and scraperwiki_local should be adjusted to do the same in order to replace scraperwiki exactly. (It's not really obvious from the error message.)

For a week or two DumpTruck, contained the insert or replace and other scraperwiki compliance things, and then we moved some of those things to scraperwiki. That might explain the difference based on where you ran the script. I'll comment further when I figure out what was wrong or when we fix this bug.

pwaller commented 11 years ago

I'm guessing this was fixed by f90bb8b. If not, please re-open this issue.