Setting Stemmer fails with sqlite 2.6.0

pteichman / cobe

A Markov chain based text generation library and MegaHAL style chatbot

http://teichman.org/blog/

MIT License

239 stars 51 forks source link

Setting Stemmer fails with sqlite 2.6.0 #19

Closed fitnr closed 9 years ago

fitnr commented 9 years ago

I'm running the cobe API on CentOS 7, Python 2.7.9, sqlite3 version 2.6.0. I'm seeing the error "Error creating stemmer: encode() argument 1 must be string without null bytes, not unicode".

I traced the problem back to the fact that sqlite3 is returning a unicode object. Here's what I see (it's the same in Python 2.7.5, btw):

>>> import sqlite3
>>> conn = sqlite3.connect('example.brain')
>>> conn.cursor().execute("SELECT text FROM info WHERE attribute = ?", ("stemmer", )).fetchone()
(u'english',)

Since PyStemmer requires a string, could a suitable text_factory be added?

pteichman commented 9 years ago

What's your full stack trace? I want to narrow down whether the failing call to encode() is under my control or in sqlite3 or PyStemmer.

My PyStemmer here works with either str or unicode for the language, and there's a call to encode() in the stemmer constructor but its first argument is a constant unicode string. Shouldn't be hard to fix, at any rate. Have you tried the text_factory (or just a call to str()) and had that fix the problem?

fitnr commented 9 years ago

I just went back in to this machine get the stack trace and the problem has vanished. The warning was being raised here when I ran cobe.brain.Brain(mybrainpath). But it isn't anymore. I walked through the Brain.__init__ in console, and cobe.tokenizers.CobeStemmer(u'english') now works just fine. I blame cosmic rays. Thanks for the attention to this non-problem.

pteichman commented 9 years ago

Thanks for the report, they're always welcome!