sensiblecodeio / scraperwiki-python

ScraperWiki Python library for scraping and saving data
https://scraperwiki.com
BSD 2-Clause "Simplified" License
160 stars 69 forks source link

Can't save lxml strings. #65

Closed drj11 closed 9 years ago

drj11 commented 9 years ago

[edited on 2014-09-22] scraperwiki.sql.save() can't save values that are instances of lxml.etree._ElementStringResult (see example below).

Discovered when running the archinterface scraper: it crashes because it was trying to save some sort of lxml string object.

drj11 commented 9 years ago

I found one!

q = lxml.html.parse('http://lxml.de/lxmlhtml.html').xpath('//td')[0].text_content()
pwaller commented 9 years ago

So.. what's the failure? That looks an awful lot like the phrase "Ian Bicking" to me.

drj11 commented 9 years ago

I just dropped that comment in before I caught the train, so at least I wouldn't forget it. It's not actually a string:

>>> scraperwiki.sql.save([], dict(t=q))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "scraperwiki/sql.py", line 187, in save
    fit_row(connection, row, unique_keys)
  File "scraperwiki/sql.py", line 314, in fit_row
    get_column_type(column_value))
  File "scraperwiki/sql.py", line 353, in get_column_type
    return PYTHON_SQLITE_TYPE_MAP[type(column_value)]
KeyError: <class 'lxml.etree._ElementStringResult'>
drj11 commented 9 years ago

This worked before https://github.com/scraperwiki/scraperwiki-python/commit/dcef2af2485ed3b5291d458509fadf7d5fba59a6

drj11 commented 9 years ago

dumptruck extends the typemap if lxml is available: https://github.com/scraperwiki/dumptruck/blob/master/dumptruck/dumptruck.py#L49

drj11 commented 9 years ago

Fixed by https://github.com/scraperwiki/scraperwiki-python/pull/67