webtorrent / node-bencode

bencode de/encoder for nodejs
MIT License
166 stars 36 forks source link

`node-bencode` can produce dictionary entries with duplicate keys. #146

Open issuefiler opened 1 year ago

issuefiler commented 1 year ago

Bug

node-bencode can produce dictionary entries with duplicate keys.


node-bencode assumes that binary string keys made out of unique Javascript string keys are unique as well, which is false.

https://github.com/webtorrent/node-bencode/blob/ee70f267c8d34b9a94820ca8c42cd67d1274fc89/lib/encode.js#L53-L55 https://github.com/ThaUnknown/uint8-util/blob/149c44c010b3ad17a7904c4266545bbca1fd4403/_node.js#L13

 encode.string = function (buffers, data) { 
   buffers.push(text2arr(text2arr(data).byteLength + ':' + data)) 
 } 
export const text2arr = str => new Uint8Array(Buffer.from(str, 'utf8'))

Proof-of-concept

For example, let node-bencode try encoding {"\uD800": 1, "\uDFFF": 2}. It’ll produce dictionary entries with the duplicate key, "3:\xEF\xBF\xBD".

const lone_surrogates = "\uD800\uDFFF";
// Lone (“unmatched”) UTF-16 surrogates. Invalid in UTF-16.

const a = Buffer.from(lone_surrogates[0], "UTF-8");
const b = Buffer.from(lone_surrogates[1], "UTF-8");
// Decoding the Javascript strings in UTF-16 and encoding them into UTF-8.

console.log(a, a.toString(), b, b.toString());
//  Since those Javascript strings are invalid in UTF-16,
// those lone surrogates are decoded
// into `REPLACEMENT CHARACTER`s (U+FFFD)
// and subsequently encoded into `<Buffer ef bf bd>`.
// Meaning,

console.log(a.equals(b));
// is true, when (lone_surrogates[0] === lone_surrogates[1]) is false.
ThaUnknown commented 1 year ago

oooh i was mentioned, yup, i have no clue what i'm looking at

issuefiler commented 1 year ago

Since Buffer.from("\uD800").equals(Buffer.from("\uDFFF")), node-bencode can produce multiple dictionary entries with the same key, which is invalid.