histrio / py-couchdb

Modern pure python CouchDB Client.
https://pycouchdb.readthedocs.org/
Other
120 stars 43 forks source link

Special characters #25

Closed jonas-hagen closed 10 years ago

jonas-hagen commented 10 years ago

Strange thing: Some values containing special characters will not read correctly after getting them back from the database.

I tested the following with requests version 1.1.0, 1.2.3 and 2.0.1 (see below).

Here are two tests I've included in the DatabaseTests class (see my fork):

    def test_special_chars1(self):
        text="Lürem ipsüm."
        self.db.save({"_id": "special1", "text": text})

        doc = self.db.get("special1")
        self.assertEqual(text, doc["text"])

    def test_special_chars2(self):
        text="Mal sehen ob ich früh aufstehen mag."
        self.db.save({"_id": "special2", "text": text})

        doc = self.db.get("special2")
        self.assertEqual(text, doc["text"])

The Result is:

Failure
Expected :'Mal sehen ob ich früh aufstehen mag.'
Actual   :'Mal sehen ob ich frĂźh aufstehen mag.'

For the different requests versions (if it matters):

This is very strange. I'm pretty sure, that the problem does not occur while saving the document, but while receiving it.

Could anyone help? I'm not even sure it is a problem of py-couchdb or requests or...

niwinz commented 10 years ago

Hi!

You are right, it is a very strange behavior... I will include your test cases and review it as soon as possible. Thanks!

jonas-hagen commented 10 years ago

If the content type of a respose is not set (as for application/json responses in couchdb), requests guesses the encoding when response.text is called, which fails for some small texts.

Possible fix is to use response.json() which uses default charset utf-8 and decodes the json directly. (An other possibility would be to set response.encoding = 'utf-8' before accessing response.text.)