lanto03 / couchdb-python

Automatically exported from code.google.com/p/couchdb-python
Other
0 stars 0 forks source link

Encoding is not quite right. #81

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a view (in python) that yields a document.
2. Create a document with a string anywhere on it with characters out of range 
of ascii.
3. View will explode when it processes that document.

What is the expected output? What do you see instead?

If I re-implement the same exact view in javascript, it works fine.

What version of the product are you using? On what operating system?

CouchDB 0.9.0, Tried CouchDB-Python 0.6 and then revision 185.

Please provide any additional information below.

Really, it's not the view code.  But here they are anyways:

# Python
def by_customer(doc):
    if doc.get('type', None) == 'internetvideo':
        yield doc.get('customer',None),doc

Breaks the second I add a document with some upper unicode on it.  However, I 
can make that 
view "pass" if I do something like...

# Python
def by_customer(doc):
    if doc.get('type', None) == 'internetvideo':
        doc['description'] = doc['description'].encode('utf-8')
        yield doc.get('customer',None),doc

The corresponding javascript version works:

function( doc ) {
  if( doc.type == 'internetvideo' ) {
    emit(doc.customer, doc);
  }
}

It do not believe it is acceptable to manually encode every string on every 
possible object.

Original issue reported on code.google.com by ian.sche...@gmail.com on 21 Jul 2009 at 12:38

GoogleCodeExporter commented 8 years ago
I made a tweak to the view server and this may (or may not) be correct.  Seems 
to fix the problem though at 
least.  Hope it's helpful.

Original comment by ian.sche...@gmail.com on 21 Jul 2009 at 12:56

Attachments:

GoogleCodeExporter commented 8 years ago
I must admit, I was quite surprised by this patch as I would not have expected 
the JSON to 
need encoding. However it seems that couchdb-python tries to keep JSON as 
unicode internally 
by using simplejson.dump's ensure_ascii=False option.

So, the patch is probably correct if you're using simplejson (or the stdlib's 
json) although 
the BOM is probably unnecessary.

Unfortunately, I suspect it's *only* going to work with simplejson right now as 
I believe 
cjson always encodes to UTF-8. And that's going to affect how things are 
handled internally.

Original comment by matt.goo...@gmail.com on 21 Jul 2009 at 11:04

GoogleCodeExporter commented 8 years ago
So is there going to be some decision here?
This bug is stopping my app from using couchdb-python's viewserver, as I can't
require the users to patch their couchdb-python in order to use my app. 

Original comment by akat.me...@gmail.com on 2 Dec 2009 at 9:05

GoogleCodeExporter commented 8 years ago
Yes I'm hitting this wall too.

Original comment by atkins...@gmail.com on 3 Dec 2009 at 6:20

GoogleCodeExporter commented 8 years ago
Can someone give this a shot?

diff --git a/couchdb/tests/view.py b/couchdb/tests/view.py
--- a/couchdb/tests/view.py
+++ b/couchdb/tests/view.py
@@ -36,6 +36,15 @@
                          'true\n'
                          '[[[null, {"foo": "bar"}]]]\n')

+    def test_i18n(self):
+        input = StringIO('["add_fun", "def fun(doc): yield doc[\\"test\\"], 
doc"]\n'
+                         '["map_doc", {"test": "b\xc3\xa5r"}]\n')
+        output = StringIO()
+        view.run(input=input, output=output)
+        self.assertEqual(output.getvalue(),
+                         'true\n'
+                         '[[["b\xc3\xa5r", {"test": "b\xc3\xa5r"}]]]\n')
+
     def test_map_doc_with_logging(self):
         fun = 'def fun(doc): log(\'running\'); yield None, doc'
         input = StringIO('["add_fun", "%s"]\n'
diff --git a/couchdb/view.py b/couchdb/view.py
--- a/couchdb/view.py
+++ b/couchdb/view.py
@@ -135,7 +135,10 @@
             else:
                 retval = handlers[cmd[0]](*cmd[1:])
                 log.debug('Returning  %r', retval)
-                output.write(json.encode(retval))
+                result = json.encode(retval)
+                if isinstance(result, unicode):
+                    result = result.encode('utf-8')
+                output.write(result)
                 output.write('\n')
                 output.flush()
     except KeyboardInterrupt:

Original comment by djc.ochtman on 10 Dec 2009 at 2:42

GoogleCodeExporter commented 8 years ago
This should be fixed in rf8e6214713 (by a change similar to the one above).

Original comment by djc.ochtman on 14 Dec 2009 at 12:15