Closed fitnr closed 9 years ago
What's your full stack trace? I want to narrow down whether the failing call to encode()
is under my control or in sqlite3 or PyStemmer.
My PyStemmer here works with either str or unicode for the language, and there's a call to encode()
in the stemmer constructor but its first argument is a constant unicode string. Shouldn't be hard to fix, at any rate. Have you tried the text_factory (or just a call to str()) and had that fix the problem?
I just went back in to this machine get the stack trace and the problem has vanished. The warning was being raised here when I ran cobe.brain.Brain(mybrainpath)
. But it isn't anymore. I walked through the Brain.__init__
in console, and cobe.tokenizers.CobeStemmer(u'english')
now works just fine.
I blame cosmic rays. Thanks for the attention to this non-problem.
Thanks for the report, they're always welcome!
I'm running the cobe API on CentOS 7, Python 2.7.9, sqlite3 version 2.6.0. I'm seeing the error "Error creating stemmer: encode() argument 1 must be string without null bytes, not unicode".
I traced the problem back to the fact that sqlite3 is returning a unicode object. Here's what I see (it's the same in Python 2.7.5, btw):
Since PyStemmer requires a string, could a suitable text_factory be added?