Open tj opened 13 years ago
^ moved from senchalabs/connect
I also had a similar case. When POST with Shift_JIS, decodeURIComponent cannot decode.
Because decodeURIComponent use only UTF-8. Other charset should use an appropriate function.
For example, This is Shift_JIS decoder library. http://lightbox.on.coocan.jp/ecl_new.txt
How about such a code? https://github.com/hokaccha/connect/commit/1f7c870ddbdf40978426d60754b31ca1f27e1df2 https://github.com/hokaccha/node-querystring/commit/8c0d5141cc92d385d9777c09f94412a49fffa0ce
then,
var express = require('express');
express.bodyParser.qs.decoder = UnescapeSJIS;
...
But, ISO-8859-1 decoder was not able to be found.
I was working on a bookmarklet that, among other things, form-posts the title of whatever page you're on to my server running Express, and I'm seeing Connect's body parser choke on some pages from Amazon.
Here's a super simple test case:
https://gist.github.com/947895
Run that website locally, drag the bookmarklet to your toolbar, and click it on any of the provided Amazon links. You should see an error message like this one:
This happens on Amazon pages where the title has special characters, like
é
orü
. You can change the title of an Amazon page (e.g. by settingdocument.title
in the console) to justé
, for example, and it will cause the bug.I've done some investigating and can give you some more info, but at a high level, it seems that the browser in this case encodes the form differently than
encodeURIComponent()
does, which causesdecodeURIComponent()
— used by Connect's body parser — to choke.For example, calling
encodeURIComponent()
on thaté
yields%C3%A9
everywhere, but what the server receives in the form body from these Amazon pages is%E9
. Attempting todecodeURIComponent()
on%E9
causes this error.I tried making a sample page for this, but the form post matched
encodeURIComponent()
. I'm guessing the behavior on Amazon is related to encoding, but I haven't been able to confirm, maybe because Express sends a Content-Type header that specifies utf-8.All said, it seems that Connect's body parser shouldn't break on these encodings. Hope this info helps. Thanks!