Closed LavrovArtem closed 8 years ago
I'm not sure if this is a bug, but it is always better to encode url manually:
var http = require('http');
http.createServer(function(req, res) {
if (req.url === '/')
res.writeHead(302, {location:'?text=' + encodeURI('что-то русское')});
res.end();
}).listen(7777);
As general advise, you're better off sticking to ASCII in HTTP headers. From RFC 7230:
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset, supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset.
Neither MIME (RFC 2047) nor ISO-8859-1 are widely supported. ASCII is the safest bet.
Something like this happens: new Buffer('что-то русское', 'binary').toString()
I understand that it would be reasonable to encode cyrillic or go without using these symbols, but we are creating a proxy server, whereas even quite well-known sites are sending headers with cyrillic.
whereas even quite well-known sites are sending headers with cyrillic
Do you have an example of a such website? It might be useful to mention a real-life example in the respective test should that be written.
Adding res._send('');
forces headers to be flushed with correct encoding. Ironically, calling res.flushHeaders()
doesn't really flush headers
@rlidwka Take https://market.yandex.ru/ If you type cyrillic letters into the search box (say, "телефон"), the header will be like http://screencast.com/t/cKzbSW0a
Another workaround:
var http = require('http');
http.createServer(function(req, res) {
if (req.url === '/') {
res.statusCode = 302;
res.setHeader('location', '?text=что-то русское');
res.flushHeaders();
}
res.end();
}).listen(7777);
@vkurchatkin Does that mean that response.writeHead
behaves differently from response.setHeader
? How that could be expected?
@LavrovArtem Yandex.Market gives properly encoded url in the Location
header for me.
What tool are you using on the screenshot? Maybe it decodes the Location
header? Have you tried pressing the «[Raw]» button?
Guess this link goes here: https://github.com/joyent/node/issues/25293
@ChALkeR I used Fiddler2 to view requests. Also, when I debug our proxy-server, I see the response coming from the original site with unencoded location header.
@vkurchatkin thank you for workarounds
@ChALkeR totally unexpected, at least for me. see #1695
Good to have those workarounds but this definitely feels like something that should be fixed. Granted, best practice would be to escape prior to setting but writeHead
shouldn't be acting this way
@jasnell the problem with writeHead
is that it doesn't actually write anything to the socket and prevents flushHeaders
from doing so. Can you take a look at #1695 ?
There is a similar problem with parsing:
var http = require('http');
http.createServer(function(req, res) {
console.log(req.url);
console.log(req.headers)
res.end('');
}).listen(3000);
curl -H "X-Test: Тест" http://localhost:3000/?тест
On node 0.10:
/?тест
Тест
On io.js 3.2:
/?ÑеÑÑ
ТеÑÑ
This is a seems like a major regression and I can't think of a proper workaround.
@bnoordhuis Quick note. ISO-8859-1 is what's returned by OneByteString()
. This started back when v8 switched from String::WriteAscii()
to String::WriteOneByte()
. And since it would be extra overhead to convert the headers to US-ASCII, we simply leave them latin-1 encoded. Demonstrative example (check X-Test
header):
var http = require('http');
http.createServer(function(req, res) {
console.log(req.headers);
res.end('bye\n');
}).listen(3000, makeRequest);
var options = {
port: 3000,
headers: { 'X-Test': 'Ä' },
};
function makeRequest() {
http.request(options, function (res) { }).end();
}
Output:
{ 'x-test': 'Ä', host: 'localhost:3000', connection: 'close' }
Closing as per https://github.com/nodejs/node/pull/2629#issuecomment-140075580.
This is working as intended. The spec only allows ascii for security reasons. :S
@Fishrock123 What about https://github.com/nodejs/node/issues/1693#issuecomment-101726168 then?
Hmm, I suppose that is an actual bug, but from my knowledge of internals quite difficult to reasonably (performantly) fix.
The entire http headers API (internals) needs a rewrite/cleanup.
Discrepancies I'm aware of:
The entire thing needs to be looked at.
Just encountered a problem with this. Wanted to apply some headers containing non-ascii characters, to a response that is sitting behind IIS (which is proxying requests). IIS threw a fit over the header values, eventually leading me to use the module mimelib
to Mime word encode the header values before using response.setHeader()
.
Wanted to apply some headers containing non-ascii characters
This is against the HTTP spec, I guess.
Edit: no, I am incorrect. The restrictions are a bit different:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
Just to be clear... RFC 7230 and it's additional docs replace RFC 2616:
https://tools.ietf.org/html/rfc7230#section-3.2 https://tools.ietf.org/html/rfc7230#section-1.2
The specific ABNF for Header field values is:
VCHAR (any visible [USASCII] character)
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-text = %x80-FF
obs-fold = CRLF 1*( SP / HTAB )
Also note the comment in Section 3.2.4:
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.
Also, from Appendix A.2:
Non-US-ASCII content in header fields ... has been obsoleted and made opaque
For backwards compatibility, header fields are generally always supposed to be treated as ISO-8859-1 but anything outside of the VCHAR
(visible US-ASCII) range is obsolete and treated as "opaque". Unfortunately the spec doesn't actually define what "opaque" means in this case. I assume that means the bytes are intended to be uninterpreted (meaning we're not supposed to decode them as UTF-8).
In other words, a header value must contain only bytes in the x20-xFF range, and should ONLY decode bytes x20-x7E while leaving bytes x80-xFF as "raw bytes". This range, of course, does happen to allow for valid UTF8 byte sequences. Interpretation of those, however, is considered "unspecified".
The original testcase should be fixed in recent versions on all supported Node.js branches, it just throws errors now, as it should.
Any reason for keeping this open?
I'm going to close this. Everyone, leave a comment here if you disagree and want this reopened for some reason.
The behaviour is consistent now.
https://github.com/nodejs/node/issues/1693#issuecomment-101726168 — this «workaround» no longer works, because it shouldn't.
Proper usage looks like this: https://github.com/nodejs/node/issues/1693#issuecomment-101659204 (or just use querystring
).
https://github.com/nodejs/node/issues/1693#issuecomment-136427052 is also fixed.
To reproduce this bug, execute the code below and open http://localhost:777/ in the browser. This page redirects to a page from the location header, and browser should open http://localhost:777/?text=что-то%20русское .
In node.js version 0.10.15 and earlier, it works fine, but io.js opens the http://localhost:777/?text=GB>-B>%20@CAA:>5 url.
I think that the issue is linked to the writeHead function that incorrectly encodes headers and sends them to the browser.