hildjj / node-cbor

Encode and decode CBOR documents, with both easy mode, streaming mode, and SAX-style evented mode.
MIT License
357 stars 73 forks source link

Encoding/decoding works in Node.js, but is broken in browser #142

Closed rse closed 3 years ago

rse commented 3 years ago

I got an interesting report from a user of Encodr (a very thin wrapper around CBOR and MsgPack): the encoding/decoding to/from CBOR for a particular UTF8 string works in Node.js, but is broken in the browser environment. I figured out that the "cbor" module itself behaves strange:

Test Script:

const CBOR = require("cbor")

const input = "a¥a"
console.log(`input: "${input}" (${typeof input}, ${input.length})`)
for (const char of input)
    console.log(`input char: "${char}" ${char.charCodeAt(0)}`)

const output = CBOR.encode(input)
console.log(`output:`, output, ` (${typeof output}, ${output.length})`)
for (const byte of new Uint8Array(output))
     console.log(`output byte: ${byte}`)

const input2 = CBOR.decode(output)
console.log(`input2: "${input2}" (${typeof input2}, ${input2.length})`)
console.log(`input === input2: ${input === input2}`)

Expected correct behaviour in Node.js:

input: "a¥a" (string, 3)
input char: "a" 97
input char: "¥" 65509
input char: "a" 97
output: <Buffer 65 61 ef bf a5 61>  (object, 6)
output byte: 101
output byte: 97
output byte: 239
output byte: 191
output byte: 165
output byte: 97
input2: "a¥a" (string, 3)
input === input2: true

Broken behaviour in the browser:

input: "a¥a" (string, 3) sample.bundle.js:423:9
input char: "a" 97 sample.bundle.js:431:13
input char: "¥" 65509 sample.bundle.js:431:13
input char: "a" 97 sample.bundle.js:431:13
output: ArrayBuffer { byteLength: 6 } (object, undefined)
output byte: 101 sample.bundle.js:448:13
output byte: 97 sample.bundle.js:448:13
output byte: 244 sample.bundle.js:448:13
output byte: 137 sample.bundle.js:448:13
output byte: 145 sample.bundle.js:448:13
output byte: 161 sample.bundle.js:448:13
input2: "a􉑡" (string, 3) sample.bundle.js:457:9
input === input2: false

Check the "output byte" lines: in the browser environment, the CBOR encoding of the input string is different! And this is the reason why the subsequent decoding results in a different string. Perhaps I'm completely overlooking something here, or the encoding of the "cbor" module in browser environments is broken.

Can you check this? Find attached the entire source code of my sample... sample.zip

hildjj commented 3 years ago

What browser? This seems to work on late-model Firefox, Chrome, and Safari on Mac.

rse commented 3 years ago

I've tested it on latest Firefox 88.0.1 (64-bit) under macOS 11.3.1.

rse commented 3 years ago

I've checked it again with both latest Firefox, Chrome and Safari under macOS 11.3.1 (x64). In all three cases the output was the broken behaviour. On the same OS, Node.js the sample works correctly. Perhaps the problem is related to the obvious difference: the browsers use an ArrayBuffer, Node.js uses a Buffer?

hildjj commented 3 years ago

Screen Shot 2021-05-25 at 10 06 42 AM (Firefox 88.0.1 on macOS 11.3.1 x64)

My steps to get to this point:

The encoding logic is from the buffer package in this case. I've verified this by modifying utf8ToBytes in sample.bundle.js, and seeing the output change.

I'm wondering what could be different on your side.

rse commented 3 years ago

Ops, you're right: if I start from scratch with the sample.zip and perform exactly your steps, it works for me, too. Now I'm puzzled. I've to compare the origin of the sample again...

rse commented 3 years ago

Ok, it's really interesting: the sample.zip is from a sub-directly of my "encodr" module, just stripped down to show-case your plain "cbor" module. The reason why you had to manually install "aliasify bignumber.js node-inspect-extracted" is because I've overlooked that these were in the parent node_modules of "encodr". The interesting point now is: if perform the above exact steps in the sub-directory, the results are broken, but if I move the directory to somewhere else it results in the correct results. So, seems like something is picked up from the node_modules directory which breaks "cbor". I've to check in more depth. Let's see whether we can find the source of the trouble...

rse commented 3 years ago

Ok, if the sample is outside the scope of the "encodr"'s node_modules directory the results are correct:

image

Once the "npm run build" is within the scope of "encodr"'s nodule_modules (in the parent directory) the results are broken:

image

One can see the difference: the Uint8Array becomes just an ArrayBuffer (for whatever reasons) and this is the source of the problem as it looks. So, it looks like a problem with Browserify to pick up something which breaks this...

hildjj commented 3 years ago

I bet you're picking up either an old version of cbor (e.g. 5.x) or an old version of buffer from your containing package. cbor 7 should return a Uint8Array, I think.

rse commented 3 years ago

I think I've found out what happens: in the old days the "encodr" module had to alias "cbor" to "cbor-js" for use in the browser. I guess it is from times where your "cbor" module was not able to run in the browser or something like this. And THIS is what happens: the "cbor-js" is broken, not your "cbor" module!

hildjj commented 3 years ago

heh. You might also try cbor-web, which is the cbor package, pre-packaged for the web (but by me, so it stays in sync).

hildjj commented 3 years ago

(however, if you're already doing the packaging, I'd stick with what you have, since then tree-shaking will work better)

rse commented 3 years ago

Yes, as the "encodr" module needs "msgpack" in addition to "cbor" I've perform a browserify step myself anyway. So, here is sufficient to use the regular "cbor" module. I'll kick out all "aliasify" and "cbor-js" stuff and use just the "cbor". Thanks for supporting me to figure out this nasty bug for one of my users! Problem now fixed.