scraperwiki / dumptruck

Painlessly move data in and out of a SQLite database.
http://sensiblecode.io
BSD 2-Clause "Simplified" License
43 stars 11 forks source link

Can't pass lxml.html strings to DB - related to scraperwiki_local #11

Closed scraperdragon closed 10 years ago

scraperdragon commented 12 years ago

Scraperwiki classic lets you say:

data['link'] = root.xpath("//a/@href")[0]

and then save data directly to the database.

This doesn't work in dumptruck / scraperwiki_local because it's not a string, it's a lxml.etree._ElementStringResult

Traceback (most recent call last): File "insolv.py", line 52, in <module> last = doindex(letter=letter, page = i) File "insolv.py", line 44, in doindex scraperwiki.sqlite.save(table_name = 'list', data = builder, unique_keys=['link']) File "/usr/local/lib/python2.7/dist-packages/scraperwiki/sqlite.py", line 27, in save dt.create_table(data, table_name = table_name, error_if_exists = False) File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 195, in create_table );''' % (if_not_exists, quote(table_name), quote(k), get_column_type(startdata[k])) File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 51, in get_column_type return u'pickle text' if isinstance(obj, Pickle) else PYTHON_SQLITE_TYPE_MAP[type(obj)] KeyError: <class 'lxml.etree._ElementStringResult'>

scraperdragon commented 12 years ago

Work around: unicode() or str() all the [relevant] things.

data = {i:unicode(data[i]) for i in data} is a little indiscriminate...

tlevine commented 12 years ago

I think scraperwiki just converts everything that is going to a text column into a string; this would explain many of the behaviors linked at the bottom of the scraperwiki_local readme.

For dumptruck, I think it makes sense to add an adapter (Or converter? I never remember which one.) from lxml.html._ElementStringResult to TEXT.

drj11 commented 11 years ago

Looks like this has been done, but it would be nice if it didn't have a direct dependency on lxml.

Should be able to use dumptruck without lxml, but also have the lxml ElementTree strings get converted in a convenient way.

pwaller commented 10 years ago

I'm closing old issues so that we can pluck signal from the noise. Please reopen if you encounter this or believe it is important.