Non-ascii content doesn't work

GoogleCodeExporter commented 8 years ago

I can hardly believe that I'm not doing anything wrong, because that kind of 
error should be detected by anyone after using library for 2 hours, but 
whatever...

What steps will reproduce the problem?
1. Save document containing unicode, like '{"a": "ąąąąą"}'
2. Watch couchdb fail.

What is the expected output? What do you see instead?

I see an UnicodeDecodeError error. And that's not surprising, because 
json.encode returns str, which is later encoded.

Normally this error wouldn't occur, but ensure_ascii=False is passed to 
json.dumps.

What version of the product are you using? On what operating system?

0.9 from pip on Ubuntu 12.10.

Original issue reported on code.google.com by zielmi...@gmail.com on 5 Jan 2014 at 10:31

GoogleCodeExporter commented 8 years ago

What version of Python are you on? How does "couchdb fail"?

Original comment by djc.ochtman on 6 Jan 2014 at 7:52

GoogleCodeExporter commented 8 years ago

Python 2.7, fails with UnicodeDecodeError in couchdb/http.py. json.dumps with 
ensure_ascii=False returns unicode or str depending on data it receives (if it 
contains unicode it returns unicode, if it contains str it returns str). This 
is generally json.dumps bug, because it itself fails with UnicodeDecodeError if 
it receives both types of strings.

The following patch fixes a part of a problem (documents mixing str and unicode 
still won't work, because of json.dumps bug):

diff -r 961ac99baa29 couchdb/http.py
--- a/couchdb/http.py   Sun Aug 18 18:41:46 2013 +0200
+++ b/couchdb/http.py   Mon Jan 06 11:01:20 2014 +0100
@@ -262,7 +262,9 @@

         if (body is not None and not isinstance(body, basestring) and
                 not hasattr(body, 'read')):
-            body = json.encode(body).encode('utf-8')
+            body = json.encode(body)
+            if isinstance(body, unicode):
+                body = body.encode('utf-8')
             headers.setdefault('Content-Type', 'application/json')

         if body is None:

Removing ensure_ascii=False from json.dumps would be other, possibly better 
solution - json.dumps correctly handles documents mixing str/unicode without 
this option.

Original comment by zielmi...@gmail.com on 6 Jan 2014 at 10:07

GoogleCodeExporter commented 8 years ago

I think the problem here is with stdlib json vs simplejson. Any patch you do 
should work with both.

Original comment by djc.ochtman on 6 Jan 2014 at 10:19

GoogleCodeExporter commented 8 years ago

Given the similarity to #235 and the changes made recently to support Python 3, 
I'm going to assume this has been fixed on the current default branch. Feel 
free to reopen if you can still reproduce this issue.

Original comment by djc.ochtman on 6 Jul 2014 at 11:04

Changed state: WorksForMe

lanto03 / couchdb-python

Non-ascii content doesn't work #232