justinvh / gitpaste

DEPRECATED - GitPaste is a clone of GitHub's Gist.
Other
184 stars 49 forks source link

Encoding in UTF-8 #26

Closed benoitjpnet closed 11 years ago

benoitjpnet commented 12 years ago

Hi,

Actually it seems that encoding of the file is done in ascii which can be a problem when you use for example, French accent or unbreakable space.

Why not use UTF-8 by default ?

You can even reproduce on gitpaste.com, posting a "é" for example. As you can see Django is in debug mode BTW.

Thanks.

justinvh commented 12 years ago

Ah, this is something simple. And I keep the gitpaste server in debug as its a testbed if anything.

benoitjpnet commented 12 years ago

I'm not familiar with Python/Django. How to change the format of encoding ? Not so simple for everyone ;)

justinvh commented 12 years ago

Well, the UTF-8 issue is an outstanding issue in Python 2.7. Python’s default encoding is the ‘ascii’ encoding. When the server writes the file to disk it needs to take into account that the data may be UTF-8. This is really solved with the bleeding-edge of Django and its support with Python 3.x.

The line specifically:

with open(filename_absolute, "w") as f:
    f.write(paste) 

Is the problem. A simple solution would be:

with codecs.open(filename_absolute, "w", "utf-8-sig") as f:
    f.write(u"éééééé")

It's just bad practice on my part for not taking into account the unicode issues. However, it should be an easy fix that I will make.