mattimustang / wadofstuff

Automatically exported from code.google.com/p/wadofstuff
10 stars 6 forks source link

UnicodeEncodeError model field returns unicode and not UTF-8 #19

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. use wad serializer in Py2.7 and Dj1.3, serializing model with e.g. field 
'name' which has value that is beyond ascii (e.g. name = u'täst'), have 
ensure_ascii=False

2. When serializing (ensure_ascii=False) you then get 'ascii' codec can't 
encode characters in position 9-10: ordinal not in range(128) in self.end and 
doing simplejson.dump

3. poof...

What is the expected output? 
Was expecting to see field correctly encoded in json string in UTF-8. u'täst' 
should be 't\xc3\xa4st'

What do you see instead?
 It is a normal python unicode string e.g. u'täst' or u't\xe4st', this is not UTF-8 so the error is correct but wrong ;-)

What version of the product are you using? On what operating system? Mac 
10.7.2, Py2.7, Django 1.3

Please provide any additional information below.

Did set ensure_ascii to True, then it returns correctly but the resulting JSON 
shown has name=u't\xe4st', i.e. the string is not utf-8 encoded.

Original issue reported on code.google.com by jens.lun...@a-b-i.se on 14 Nov 2011 at 8:34

GoogleCodeExporter commented 9 years ago
seems related and similar to this: 
http://groups.google.com/group/django-users/browse_thread/thread/4f5f99b730ee0aa
e/db766ece1dcc78fe

Original comment by jens.lun...@a-b-i.se on 14 Nov 2011 at 8:35

GoogleCodeExporter commented 9 years ago
tested and have been running django default serializer and it generates correct 
results both with ensure_ascii=true and false. 

Original comment by jens.lun...@a-b-i.se on 14 Nov 2011 at 9:44

GoogleCodeExporter commented 9 years ago
which wadofstuff-django-serializers version are you using?

Can you provide a cut down example code and data that highlights the bug?

Original comment by mattimus...@gmail.com on 14 Nov 2011 at 11:19

GoogleCodeExporter commented 9 years ago
Wad version 1.1.0 

Mvh / Jens

15 nov 2011 kl. 00:20 skrev wadofstuff@googlecode.com:

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 8:57

GoogleCodeExporter commented 9 years ago
1) installed wadofstuff in my django app directory
2) added SERIALIZATION_MODULES = { 'json': 'wadofstuff.django.serializers.json' 
} in settings.py
3) have created a JSON Response that I use in my views 

class JSONResponse(HttpResponse):
    status_code = 200
    def __init__(self, queryset, **options):
        json_response = serializers.serialize('json', queryset, ensure_ascii=False, indent=2, **options)
        HttpResponse.__init__(self, json_response, content_type='application/json')

4) I then have call JSONReponse([match]) where match is a model with a field 
'name' that can contain unicode 
characters. Here I have name = u'näme'

5) With the above settings I get the following error:

UnicodeEncodeError at /ipsc/json/match/1/
'ascii' codec can't encode character u'\xe4' in position 2: ordinal not in 
range(128)
...
see attached file for pdf print out of error...

----

6) when changing to ensure_ascii = True I get the following results

[
  {
    "pk": 1,
    "model": "match_ipsc.ipscmatch",
    "fields": {
      "name": "n\u00e4me",
    }
  }
]

----

7) if removing wad and running default django serializer in same scenarion 
(removed from settings.py)
I get the same results as in (6) when having ensure_ascii=True *BUT* when 
having ensure_ascii=False I
get the following result (and this is what I was expecting to get when using 
wadofstuff serializer):

[
  {
    "pk": 1,
    "model": "match_ipsc.ipscmatch",
    "fields": {
      "name": "näme", 
  }
  }
]

So, seems like wadserializer introduces something strange here. The value of 
name is from python.handle_field
and have tried doing smart_str on the value before saving it to _fields but 
same issue.

/ Jens

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 11:21

Attachments:

GoogleCodeExporter commented 9 years ago
Hi, delete this email send from me - have uploaded details and log directly on 
web instead.

/ Jens

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 11:33

GoogleCodeExporter commented 9 years ago
attached here is the log file

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 11:35

Attachments:

GoogleCodeExporter commented 9 years ago
tested with both running simplejson 2.2.1 and also the simplejson shipped with 
django 2.0.7 and same behavioral...

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 12:03

GoogleCodeExporter commented 9 years ago
Think I have found something - for some reason I am 'fp' is a cStringIO.StringO 
when using wadserializer but when running django serializer it is 
StringIO.StringO. So when the json serializers does fp.write(chunk) and chunk 
is a unicode string the it works when StringIO but when cStringIO it generates 
'ascii' codec can't encode character.

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 12:14

GoogleCodeExporter commented 9 years ago
So, now after talking with myself for awhile the following corrects the issue:

in base.py remove the import of cStringIO and only use StringIO

# try:
#     from cStringIO import StringIO
# except ImportError:
#    from StringIO import StringIO
from StringIO import StringIO

mvh / Jens

Original comment by jens.lun...@a-b-i.se on 15 Nov 2011 at 12:17

GoogleCodeExporter commented 9 years ago
thank you Jens, I spent a whole night trying to figure out what's wrong with my 
json dumps and utf8 support until I recalled that I switched to wadofstuff and 
instantly got this as an answer in search query. It solved my (the same) 
problem immediately.

Aidin

Original comment by ssb...@gmail.com on 2 Aug 2013 at 3:38