protocolbuffers / protobuf-javascript

BSD 3-Clause "New" or "Revised" License
351 stars 67 forks source link

handle NUL/corrupt String in jspb.BinaryDecoder.prototype.readString #190

Closed hwygithub closed 2 months ago

hwygithub commented 7 months ago

jspb.BinaryDecoder.prototype.readString handles UTF-8, existing behavior assumes char 128-191 is a sync issue and ignores it, but this function should also never encounter a zero/NUL byte since it's decoding a String. This can occur with sync issues or from corrupt/malformed/malign data being in the Binary block, by aborting and continuing at the expected end cursor point we probably skip some bad text input, by continuing to read we are more likely to read junk and continue to decode and move the cursor past the end of the text - this fix avoids that and improves the chance the Decoder succeeds without parsing corrupt data.

PASSES TEST:
151 specs, 0 failures
Finished in 1.611 seconds
Randomized with seed 68142 (jasmine --random=true --seed=68142)

[22:47:53] Finished 'test_commonjs' after 1.87 s
[22:47:53] Finished 'test' after 24 s
google-cla[bot] commented 7 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

dibenede commented 7 months ago

Hi, are you encountering an issue that this change fixes? We are aware of a bug in BinaryDecoder around over reading, but this isn't the fix we have internally for it: the null character is allowed in utf8 strings. Instead, we apply error recovery when length says to expect so many bytes, but fewer remain.

We can try to port out our internal fix for the over-read issue.

hwygithub commented 7 months ago

Hi, are you encountering an issue that this change fixes? We are aware of a bug in BinaryDecoder around over reading, but this isn't the fix we have internally for it: the null character is allowed in utf8 strings. Instead, we apply error recovery when length says to expect so many bytes, but fewer remain.

We can try to port out our internal fix for the over-read issue.

i'm just keen to get a fix/workaround in since my colleague aren't keen to modify the prod version manually without it being part of the standard google code. if you're workaround achieves the same outcome (basically sets the cursor/pointer to the proper end of the string so that it doesn't skip past the end) it'll work for me. as you're a contributor and i'm new i'd say go with what you have if you can get it included? anything i can do to aassist?

dibenede commented 7 months ago

I'll see what I can do. I've started preparing the patch, but it have some issues to work through. I suspect we'll want to cut a new release once it's in.

dibenede commented 2 months ago

Closing this out in favor of #191.