ruby / json

JSON implementation for Ruby
https://ruby.github.io/json
Other
673 stars 326 forks source link

[FIX] JSON.dump for hash with duplicate key #564

Open maniSHarma7575 opened 9 months ago

maniSHarma7575 commented 9 months ago

Motivation / Background

This Pull Request has been created because to fix #563

The output for the hash is inconsistent when using JSON.dump with the duplicate key. Specifically, if we examine the result below, the key 'a' appears twice in the output JSON for the same key 'a'.

For example:

JSON.dump({ 'a' => 1, :a => 2 })

Actual Result:

"{\"a\":1,\"a\":2}"

Expected result:

"{\"a\":2}"

cc: @byroot @hsbt

maniSHarma7575 commented 9 months ago

@hsbt , did you get a chance to check this out?

cc: @byroot

hsbt commented 9 months ago

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}

Please let me know if my understanding is wrong.

maniSHarma7575 commented 9 months ago

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}

Please let me know if my understanding is wrong.

@hsbt , if the colon (:) syntax is used to create symbols as keys, the output matches the expected result:

ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.6.3
{"a":2}

ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.7.1
{"a":2}

However, when the rocket (=>) syntax is used to associate keys with values in the hash, the JSON output contains duplicate values.

Therefore, the output should be consistent regardless of the syntax used to create the hash.

byroot commented 9 months ago

@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.

However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.

byroot commented 9 months ago

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

@hsbt he's not trying to fix a regression, but do prevent duplicated keys in the output. Arguably it's a minor bug, but it's true that it's not really expected for a JSON library to allow this.

As you know JSON has multiple standards, most don't say anything about that, but RFC 8259 says keys SHOULD be unique.

And more generally it's unclear how parsers will handle it, some may keep the first value, some the last, etc.

maniSHarma7575 commented 9 months ago

@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.

However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.

@byroot , it's functioning well for all three alternative implementations, as I've implemented the fix in lib/json/common.rb.

https://github.com/flori/json/blob/78643245d2b60f770946e2e252890655d61737fc/lib/json/common.rb#L614

For all the 3 implementations I have tested it:

ruby -v
jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f Java HotSpot(TM) 64-Bit Server VM 25.391-b13 on 1.8.0_391-b13 +jit [arm64-darwin]
rake build
gem build json-java.gemspec
  Successfully built RubyGem
  Name: json
  Version: 2.7.2
  File: json-2.7.2-java.gem
mkdir -p pkg
mv json-2.7.2-java.gem pkg

Created a script json.rb

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
rake build
Resolving dependencies...
(in /Users/manishsharma/Development/personal/opensource/json)
Writing version information for 2.7.2
gem build json.gemspec
  Successfully built RubyGem
  Name: json
  Version: 2.7.2
  File: json-2.7.2.gem
gem build json_pure.gemspec
  Successfully built RubyGem
  Name: json_pure
  Version: 2.7.2
  File: json_pure-2.7.2.gem
mkdir -p pkg
mv json-2.7.2.gem pkg
mv json_pure-2.7.2.gem pkg
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json_pure", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json/pure'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"
maniSHarma7575 commented 8 months ago

@hsbt and @byroot Waiting for your response.

maniSHarma7575 commented 7 months ago

Hi @hsbt Could you please review this?

maniSHarma7575 commented 5 months ago

@byroot Waiting for the response on this?

cc: @hsbt

jhawthorn commented 4 months ago

I don't think JSON should perform this validation. It will hurt performance a lot and it's not something we can fully guarantee to users.

This may cover simple cases but not, for example, differently encoded strings:

> JSON.dump({"foo".encode("UTF-16").to_s => 1, "foo".to_s => 2})
=> "{\"foo\":1,\"foo\":2}"

Covering that would have to be quite difficult. I think it's best we leave the responsibility to the user to provide reasonable input.

(As I wrote in the Rails issue) I'd actually prefer { 'a' => 1, :a => 2 } be turned into the questionably valid ("SHOULD" is not quite as strong as "MUST") JSON as "{\"a\":1,\"a\":2}" rather than either "{\"a\":1}" or "{\"a\":2}" as it would allow me to see that there is an issue with my program rather than hiding the error and making an arbitrary choice.