nodejs / node-v0.x-archive

Moved to https://github.com/nodejs/node
34.44k stars 7.31k forks source link

http not support unicode headers when write buffer #9139

Closed magicode closed 9 years ago

magicode commented 9 years ago
var http = require("http");
var server = http.createServer(function(req,res){
    res.setHeader("unicode" ,"★⚝✔❁");
    res.end(new Buffer('hello'));
});

server.listen(process.env.PORT,process.env.IP,function(){
    http.get({hostname: process.env.IP , port: process.env.PORT , path:'/' }, function (res) {
        console.log(res.headers);
    });
});

console output

{ unicode: '\u0005�\u0014A',
  date: 'Wed, 04 Feb 2015 23:55:57 GMT',
  connection: 'keep-alive',
  'transfer-encoding': 'chunked' }

var http = require("http");
var server = http.createServer(function(req,res){
    res.setHeader("unicode" ,"★⚝✔❁");
    res.end('hello');
});

server.listen(process.env.PORT,process.env.IP,function(){
    http.get({hostname: process.env.IP , port: process.env.PORT , path:'/' }, function (res) {
        console.log(res.headers);
    });
});

console output

{ unicode: '★⚝✔❁',
  date: 'Wed, 04 Feb 2015 23:56:14 GMT',
  connection: 'keep-alive',
  'transfer-encoding': 'chunked' }

The problem is here https://github.com/joyent/node/blob/7c0419730b237dbfa0ec4e6fb33a99ff01825a8f/lib/_http_outgoing.js#L132 Should be utf8 encoding

Sembiance commented 9 years ago

Similar errors exist for requesting HTTP URL's that have UTF8 characters in them, see #9286

dougwilson commented 9 years ago

This would be a violation of the HTTP specification; header values are not UTF-8, only raw binary (and suggested to be encoded as Latin-1). You need to read RFC 7230 section 3.2.4:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

You need to do res.setHeader("unicode" , new Buffer("★⚝✔❁").toString('binary')); and the the other side of the connection needs to agree the header's value is UTF-8 octets and not Latin-1 octets somehow. Or simply encode using RFC 2047.