tj / node-querystring

querystring parser for node and the browser - supporting nesting (used by Express, Connect, etc)
MIT License
455 stars 66 forks source link

Support for Buffers #45

Closed seishun closed 10 years ago

seishun commented 11 years ago

Sometimes you need to send urlencoded binary data. However, the following: qs.stringify({a: new Buffer([0x00, 0x30, 0x7f, 0x80])}) does not produce meaningful output. One would expect it to return something like: 'a=%000%7F%80'

It should be fairly easy to implement. I would submit a pull request, but there are multiple ways to urlencode a Buffer. The simplest one is probably the following: escape(buf.toString('binary'))

Unfortunately, the 'binary' encoding is deprecated and is planned to be removed according to the docs, so you might not want to rely on it. The other two ways I know are a bit ugly: Array.prototype.map.call(buf, function(byte) { return '%' + byte.toString(16); }).join('') and buf.toString('hex').match(/../g).map(function(byte) { return '%' + byte; }).join('')

Both produce the same output as the 'escape' method, except they percent-encode every byte, not only those with the value higher than 127.

What do you think? Perhaps someone knows a better method?

buschtoens commented 11 years ago

We'd probably have to do performance tests on those two methods, though I'd predict, that the first one should be a lot faster than the second one. It also would allow for a simple way to check for >127 values.

Array.prototype.map.call(buf, function(byte) {
  return byte > 127 ? '%' + byte.toString(16) : String.fromCharCode(byte);
}).join('');

However, running this code shows, that we'd have to do a more sophisticated check.

var arr = [];
for(var i = 0; i < 128; i++) arr.push(String.fromCharCode(i));
console.log(arr);

We should only allow bytes from these ranges to be not encoded with %:

This leaves us with this snippet:

function padLeft(str, pad) {
  return pad.substring(0, pad.length - str.length) + str;
}

function encodeByte(byte) {
  return (byte >= 49 && byte <= 57)
      || (byte >= 65 && byte <= 90)
      || (byte >= 97 && byte <= 122)
      ? String.fromCharCode(byte) : '%' + padLeft(byte.toString(16), "00");
}

function encodeBuffer(buf) {
  return Array.prototype.map.call(buf, encodeByte).join('');
}

This will also be one way only, meaning that we'll only stringify Buffers. Trying to detect binary querystrings can go horribly wrong and would make the API ambiguous.

tj commented 11 years ago

personally I would just .toString() it, I don't think shipping binary in a querystring is all that useful

buschtoens commented 11 years ago

Seems like it isn't this simple...

encodeURIComponent(Buffer([0xEF]));
// => '%EF%BF%BD'
seishun commented 10 years ago

@silvinci A Buffer can't have >127 values. At this point you might as well call String.fromCharCode on each byte, then call escape on the resulting string:

escape(String.fromCharCode.apply(null, buf))
buschtoens commented 10 years ago

@seishun wat. new Buffer([0xFF]) is equal to new Buffer([255]). A Buffer is a collection of octets (bytes). One byte goes from 0 to 255.

We wanna make sure, that binary data is encoded, when neccessary (%EF). This way normal alphanum chars won't take up to much space.

Anyways: Why would you want to send binary data? I can't come up with a real use case for this, however Buffer is a standard type in Node, so we should support it.

seishun commented 10 years ago

Okay, had a brain fart. My snippet works anyway though.

Some APIs expect binary data, for example, logging in with Steam. See https://github.com/seishun/node-steam/blob/master/lib/handlers/user.js#L84.

buschtoens commented 10 years ago

On a second thought, we should leave the optional encoding out, so a Buffer will always be completely encoded with %. This would make it look more consistent and should be easier to parse for the receiver.

Your Array.prototype.map snippet misses propper padding. Otherwise it's fine.

However, as buf.toString('hex') does a native C binding call, this should be the fastest method of retrieving the encoded Buffer. The remaining question is how to insert the %. I'll do a benchmark on different approaches.

buschtoens commented 10 years ago

Okay, we have a winner. 1,000,000 rounds took me under 20 s.

var buf = new Buffer(256);
for(var b = 0; b < buf.length; b++) buf[b] = b;

console.log(buf.toString('hex').replace(/../g, '%$&').toUpperCase()); // %00%01 ... %FE%FF

Array.prototype.map took 65 s.

tj commented 10 years ago

@seishun I would base64 or hex-encode them personally, closing for now I don't think this is necessary

seishun commented 10 years ago

@visionmedia

I would base64 or hex-encode them personally

Are you suggesting me to change the API? I have no control over that. Otherwise your module would be unnecessary because everyone would use JSON.

buschtoens commented 10 years ago

You can always use your own fork or manually encode the Buffer. The code's free to use. :)

tj commented 10 years ago

@seishun true, fair enough, just seems like a pretty extreme edge-case to store binary in a querystring

seishun commented 10 years ago

In case someone finds this issue and needs to implement it:

Do not use escape. It doesn't escape '+', which stands for a space character in x-www-form-urlencoded. Instead, use @silvinci's method above.