node-js-libs / curlrequest

A cURL wrapper
MIT License
184 stars 44 forks source link

Encoding: UTF-8 vs. ISO-8859-1 #3

Closed sorenlouv closed 12 years ago

sorenlouv commented 12 years ago

I have experienced problems with POST request to websites with encoding set to ISO-8859-1. All data is encoded with encodeURIComponent which, as I understand it, is encoding for UTF-8:

    //Parse POST data
    if (options.data && typeof options.data === 'object') {
        var data = [];
        for (var key in options.data) {
            data.push(encodeURIComponent(key) + '=' + encodeURIComponent(options.data[key]));
        }
        options.data = data.join('&');
    }

To encode for ISO-8859-1 I use escape() instead of encodeURIComponent()

chriso commented 12 years ago

encodeURIComponent is correct and all JS strings are utf8 encoded (as far as I know). If you need to use custom encoding just encode it yourself and put the resulting string in options.data.

Our of interest though what sort of data are you sending? escape, encodeURI and encodeURIComponent only differ in which characters they encode as %xx - this article has a good comparison. It looks like escape will encode all non-ascii characters. This might have worked for you as there's no difference between utf8 and iso-8859-1 when you're only dealing with ascii chars.

sorenlouv commented 12 years ago

I'm posting a form to a webserver I don't have control over. I first tried to pre-encode the data with escape() but this was then double encoded by your encodeURIComponent() and the server won't accept it.

I'm Danish and post strings which include the special characters æ, ø and å. Take the string "Søren" for instance. The server expect it as S%F8ren.

Escape: S%F8ren encodeURIComponent: S%C3%B8ren Escape + encodeURIComponent: S%25F8ren

I solved it by removing the encodeURIComponent() in the curlrequest module and always encode the data before hand, as needed. Anything I've misunderstood?

chriso commented 12 years ago

Ok. Looking around the web it looks like you should be POSTing with a content-type of application/x-www-form-urlencoded; charset=UTF-8. Of course the server is free to ignore this when parsing the body.

I'll change it to escape for now and then revisit the issue if anyone else has any objections. I suppose it can't hurt having more characters url-encoded than necessary.