ndarville / pony-forum

A modern alternative to ancient forum CMSes like vBulletin and PHPBB in Python on Django. (Alpha stage.) (NB: dotCloud have since removed their free Sandbox tier.)
http://pony-forum.com
26 stars 7 forks source link

Local MySQL encoding problems #17

Closed ndarville closed 12 years ago

ndarville commented 12 years ago

The following gives an error:

Testing footnotes[^1].

[^1]: Note.

This is the intended output:

<p>Testing footnotes.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<div class="footnotes">
    <hr />
    <ol>
        <li id="fn:1"><p>Note.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
    </ol>
</div>

This is the error:

OperationalError at /post/7/edit/

(1366, "Incorrect string value: '\xE2\x86\xA9</a...' for column 'content_html' at row 1")

Exception Type: OperationalError Exception Value: (1366, "Incorrect string value: '\xE2\x86\xA9</a...' for column 'content_html' at row 1")

Articles

ndarville commented 12 years ago

This problem is caused by bleach in sanitized_smartdown. Will investigate solution.

The problem is either:

  1. An encoding problem.
  2. A problem with with the allowed tags and attributes (not bloody likely).

From the docs:

Bleach always returns a unicode object, whether you give it a bytestring or a unicode object, but Bleach does not attempt to detect incoming character encodings, and will assume UTF-8.

If you are using a different character encoding, you should convert from a bytestring to unicode before passing the text to bleach.

ndarville commented 12 years ago
mysql> use mydb;
Database changed
mysql> show variables like 'char%';
+--------------------------+---------------------------------------------------------------+
| Variable_name            | Value                                                         |
+--------------------------+---------------------------------------------------------------+
| character_set_client     | latin1                                                        |
| character_set_connection | latin1                                                        |
| character_set_database   | latin1                                                        |
| character_set_filesystem | binary                                                        |
| character_set_results    | latin1                                                        |
| character_set_server     | latin1                                                        |
| character_set_system     | utf8                                                          |
| character_sets_dir       | C:\Program Files (x86)\MySQL\MySQL Server 5.5\share\charsets\ |
+--------------------------+---------------------------------------------------------------+
ndarville commented 12 years ago

Dropping MySQL support for now and focusing on postgreSQL.