ruby / json

JSON implementation for Ruby
https://ruby.github.io/json
Other
677 stars 330 forks source link

C and Pure parser accept invalid UTF-8 strings, the Java parser doesn't. #138

Open headius opened 12 years ago

headius commented 12 years ago

I know, I know...if it's bad content it's bad content. But this represents a difference from MRI.

Here's the case, again a reduced version of one I got from @rkh:

# encoding: utf-8
require 'json'
x = "{\"foo\":\"\xC3\"}"
h = JSON.parse(x)
p h['foo']
p h['foo'].encoding

So basically there's a bad byte in a UTF-8 string, and the MRI version walks right by it and allows it to come through to the resulting parsed json structure.

I have a totally broken patch for this:

diff --git a/java/src/json/ext/ByteListTranscoder.java b/java/src/json/ext/ByteListTranscoder.java
index ed9e54b..a7e42ba 100644
--- a/java/src/json/ext/ByteListTranscoder.java
+++ b/java/src/json/ext/ByteListTranscoder.java
@@ -78,9 +78,10 @@ abstract class ByteListTranscoder {
             return head;
         }
         if (head <= 0xbf) { // 0b10xxxxxx
-            throw invalidUtf8(); // tail byte with no head
+            return head; //throw invalidUtf8(); // tail byte with no head
         }
         if (head <= 0xdf) { // 0b110xxxxx
+            if (pos + 1 > srcEnd) return head;
             ensureMin(1);
             int cp = ((head  & 0x1f) << 6)
                      | nextPart();

Again, I'm not sure this is actually something that needs to be fixed, but because the MRI version of json does not blow up on this content, there's something to be addressed.

flori commented 12 years ago

I think the other variants should also raise an exception. I already implemented this for Ruby 1.9 extension and pure variants , but 1.8 is a bit more difficult.

headius commented 12 years ago

I'm not particularly concerned about 1.8. The report we got was that JRuby errored in 1.9 mode and I believe the user's only concerned with 1.9 mode. Hopefully the upstream data can be fixed too, since obviously the "fix" of actually erroring out will now cause the "bug" for MRI too.

byroot commented 1 week ago

If anything, I think it's the C and Pure parsers that should raise an exception.

The problem is that this behavior has been there forever, so there's a high chance it would break some users, which lead to the annoying conclusion that it should be behind a configuration flag.

headius commented 1 week ago

It would not be possible for us to implement such a config flag because the parser we use is always strict about this. @flori said above they implemented the error for 1.9+ pure and ext, so is there really anything to do here 12 years later?

byroot commented 1 week ago

As far as I can tell, it's still not raising today:

>> JSON::Pure::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].encoding
=> #<Encoding:UTF-8>
>> JSON::Pure::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].valid_encoding?
=> false
>> JSON::Ext::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].encoding
=> #<Encoding:UTF-8>
>> JSON::Ext::Parser.new("{\"foo\":\"\xC3\"}").parse["foo"].valid_encoding?
=> false
headius commented 1 week ago

Perhaps that fix never made it into main then. ☹️