mgholam / fastJSON

Smallest, fastest polymorphic JSON serializer
https://www.codeproject.com/Articles/159450/fastJSON-Smallest-Fastest-Polymorphic-JSON-Seriali
MIT License
479 stars 147 forks source link

json5 support #121

Closed hamarb123 closed 3 years ago

hamarb123 commented 3 years ago

Hey! I love using this json library, I was wondering if it would get json5 support in the future. I know it can already do some form of unquoted key IDs, but I was hoping for full json5 support (as an option of course). Specifically, the additional features that I want/need are as follows: "Object keys may be an ECMAScript 5.1 IdentifierName.", "Objects may have a single trailing comma.", "Arrays may have a single trailing comma.", "Numbers may be IEEE 754 positive infinity, negative infinity, and NaN." Also nice would be: "Numbers may have a leading or trailing decimal point.", "Numbers may begin with an explicit plus sign.", "Single and multi-line comments are allowed.", "Additional white space characters are allowed." I understand if this is too much effort or whatever, I just wanted to let you know that I really appreciate this library, and what features I would appreciate the most!

mgholam commented 3 years ago

Do you have any data on json5 usage?

hamarb123 commented 3 years ago

@mgholam what do you mean by that, like how much it is used compared to normal json?

mgholam commented 3 years ago

Yes.

hamarb123 commented 3 years ago

On npm, the most popular json extension (by far) is json5 https://www.npmtrends.com/hjson-vs-json5-vs-jsonbird-vs-relaxed-json.

mgholam commented 3 years ago

I will see what I can do.

mgholam commented 3 years ago

Everything except the string part is done.

I'm having a hard time understanding the rationale behind the '\' char in the string since the example and the spec don't match

in the sample json5 :

lineBreaks: "Look, Mom! \
No \\n's!",

The following strings represent the same information:

'Lorem ipsum dolor sit amet, \
consectetur adipiscing elit.'

'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
hamarb123 commented 3 years ago

Hey, it is not very clear, but the example

lineBreaks: "Look, Mom! \
No \\n's!",

is equivalent to:

lineBreaks: "Look, Mom! No \\n's!",

Which represents the C# string: "Look, Mom! No \\n's!" It basically just removes the new line.

mgholam commented 3 years ago

Which begs the question, why break a line like that if it has no meaning?

hamarb123 commented 3 years ago

It is basically just stylistic. It's a feature in ECMAScript (js) and I believe some other programming languages. I don't think I'd ever use it for example, but it's probably best to follow the spec.

mgholam commented 3 years ago

Single and double quotes are interchangeable now, the ending '\' is a pending...

mgholam commented 3 years ago

Checkout v2.4.0

hamarb123 commented 3 years ago

Those tests look good to me! It would be nice if you could add "Object keys may be an ECMAScript 5.1 IdentifierName" as well, but not necessary. Thanks!

hamarb123 commented 3 years ago

Except: In theory, +NaN and -NaN should be allowed too Only the following capitalisations should work: Infinity, NaN On the line s = "{'a':1, 'b':2, 'c': 3}".Replace("'", "\""); (in json5_trailing_comma), shouldn't there be a trailing comma? I don't think Assert.AreEqual(double.NaN, (double)o["a"]); would work as double.NaN == double.NaN is false, you could do Assert.True(double.IsNaN((double)o["a"])) instead, or Assert.AreEqual(BitConverter.DoubleToInt64Bits(double.NaN), BitConverter.DoubleToInt64Bits((double)o["a"]));. Not sure if this is the same for infinities. You should probably also have a test that fails for strings with \1 to \9 as well.

hamarb123 commented 3 years ago

-NaN seems to be equivalent to double.NaN as well

mgholam commented 3 years ago

-NaN seems to be equivalent to double.NaN as well

I can't find a reference to a negative NaN

hamarb123 commented 3 years ago

I looked at the reference implementation https://github.com/json5/json5

mgholam commented 3 years ago

Google dev js console ignores -NaN

hamarb123 commented 3 years ago

image Yes, it is the same as NaN.

hamarb123 commented 3 years ago

This is a json5 checker https://jsonformatter.org/json5-validator

hamarb123 commented 3 years ago

Also, for the line escape, it only escapes the next one '\n' or '\r' or '\r\n'. and not any tabs or spaces. ie.

"abc\
    message"

would be

"abc    message"

and

"abc\

message"

would be illegal.

mgholam commented 3 years ago

I presumed the white spaces proceeding a '\' up to new line is ignored... the code just skips everything until non white space.

mgholam commented 3 years ago

Should be fixed in v2.4.0.1

mgholam commented 3 years ago

+NaN -NaN

case insensitive NaN and Infinity

hamarb123 commented 3 years ago

Okay, I'll give it all a test when it makes its way to nuget!

mgholam commented 3 years ago

published.

hamarb123 commented 3 years ago

Here's the issues that I've found so far: -0x01 should parse as -1 "\v" should parse as "\u000B" "\f" should parse as "\u000C" "\0" should parse as "\0" "\x0f" should parse as "\x0f" "\a" should parse as "\a" \f, \v, \uFEFF, \u00A0, \u2028, \u2029, \u2003 should be considered whitespace as well as \t \r \n and space Other than that it's only missing "Object keys may be an ECMAScript 5.1 IdentifierName." from the main set of json5 tests!

hamarb123 commented 3 years ago

And -0x0123456789abcdefABCDEF should parse as -1.3754889325393114e+24

hamarb123 commented 3 years ago

Also 100000000000000000000000000000000000000000 should parse as 1e+41

mgholam commented 3 years ago

How can a hex number have a sign? by definition it is unsigned.

"Object keys may be an ECMAScript 5.1 IdentifierName" is handled with AllowNonQuotedKeys

hamarb123 commented 3 years ago

In json5 it is allowed to have '+', '-' in front of a hex number (or any number for that matter, which is why -NaN works). You just calculate the normal one and then apply the negation operation.

hamarb123 commented 3 years ago

{\u0061\u0062:1,\u0024\u005F:2,\u005F\u0024:3} should parse as Dictionary<string, object>(3) { { "ab", 1 }, { "$", 2 }, { "$", 3 } }

mgholam commented 3 years ago

{\u0061\u0062:1,\u0024\u005F:2,\u005F\u0024:3} seems like unrealistic edge case, it will make the parser more complex and slower.

mgholam commented 3 years ago

Also 100000000000000000000000000000000000000000 should parse as 1e+41

Technically the above is not prohibited in the json spec, and now all numbers above 32 chars in length are treated as double (however even the v8 js engine auto converts to 1e+41 so it is unlikely to happen)

hamarb123 commented 3 years ago

I've made my own fork with some json5 changes (including being theoretically completely conformant) at https://github.com/hamarb123/fastJSON5, feel free to copy back any changes that you want to the original fastJSON library (under the relevant licenses if applicable). I'll continue to update it when I can be bothered with the changes that happen here, but in theory I shouldn't have to change too much from your commits to my fork. It's currently incomplete but is passing almost every test that I want to have.