duplicate keys are lost

Lx commented 4 years ago

The following JSON has two "1" keys, which is not great but is technically allowed:

{
    "1": 3,
    "2": 2,
    "1": 1,
    "4": 4
}

Sorting this JSON causes the "1": 1 key/value to be lost:

{
    "1": 1,
    "2": 2,
    "4": 4
}

My preference would be for the duplicates to be retained in their original order:

{
    "1": 3,
    "1": 1,
    "2": 2,
    "4": 4
}

richie5um commented 4 years ago

Interesting. Not seen that before. Definitely an edge case ;-).

I suspect this'll be difficult for me to fix as the file goes though the parser and stringify.

Running this in ChromeV8, you lose the first property on the parse.

I tried with the JSON5 parser and it has the same behaviour too.

There is an alternative that you could take, but, it is gonna be very error prone.

[.... 2 hours later ....]

I've tried for the past few hours to come up with 'the alternative', but, it really doesn't work.

Let me know if you have any good ideas.

On Sat, 11 Apr 2020, at 1:46 AM, Alex Peters wrote:

The following JSON has two "1" keys, which is not great but is technically allowed https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object:

{ "1": 3, "2": 2, "1": 1, "4": 4 } Sorting this JSON causes the "1": 1 key/value to be lost:

{ "1": 1, "2": 2, "4": 4 } My preference would be for the duplicates to be retained in their original order:

{ "1": 3, "1": 1, "2": 2, "4": 4 }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/richie5um/vscode-sort-json/issues/34, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG5OGMGYC6K5RJBS5GEWRTRL64YHANCNFSM4MFZISBA.

Lx commented 4 years ago

Tough break! It makes sense that a JS stringify discards prior key/value pairs with duplicate keys though.

I realise that not having duplicate keys in the input JSON is the easiest solution, but I don't have control over the generation of the input JSON—in fact I installed this extension because I knew I had duplicates, and I figured that at least sorting them so that they were together would better enable me to deal with them before doing any further processing on that JSON.

I suppose the ideal solution would be not to parse as a JavaScript object (which destroys duplicates without warning), but instead as an abstract syntax tree (which retains everything). The Babel parser might be a good choice.

With an AST you could easily get all of the keys and values (including duplicates) as an ordered list, sort the keys (retaining value order for duplicate keys), and then output the AST to code again via something like the Babel generator. You could probably totally ignore the subtrees generated for the values, merely passing them to the generator along with the sorted keys.

At minimum, it would be great if the extension could at least detect duplicates and warn through some pop-up that they have been removed after the sort, which currently doesn't happen for understandable reasons. I suspect that ASTs would still be the only way around this, too.

Shoelace commented 3 years ago

if you go ahead with this.. have an option to specify a secondary sort would be nice.. ie key or key then value

guyisra commented 3 years ago

just found out about this issue.. now I realize that data could have been lost due to this.. I'm deleting this extension. It should just sort, not remove data!

vanowm commented 1 year ago

Technically no data is lost, since only last duplicate item will be available to that application that uses the json. ECMA-262 "ECMAScript® Language Specification":

In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.

Aka, last value wins.

So, when the extension sorts the json, it does so without the values that wouldn't be accessible in the first place, hence no data is lost...

Lx commented 1 year ago

Technically no data is lost…

If I feed 10 key/value pairs into a system and only 8 pairs come out, then I'm sure we would both agree that data has been lost. I therefore believe your intended point might actually be: "technically some data deserves to be lost".

I agree with you that the original JSON data shouldn't be using duplicate keys. However, in the real world, we don't always get the luxury of perfect input data. My example data was contrived for the purpose of this bug report. My actual, real-world data consisted of thousands of lines, and I really needed to identify and specially handle the data with duplicate keys. Plus, I was working in a text editor—as such, I would expect a general-purpose JSON sorter inside a text editor not to reduce the size of any data set it processes.

Also, it's technically permissible to use duplicate keys per RFC 8259 (The JavaScript Object Notation (JSON) Data Interchange Format), which says:

The names within an object SHOULD be unique.

…and "SHOULD" is not "MUST" per RFC 2119 (Key words for use in RFCs to Indicate Requirement Levels). In other words, while not ideal, JSON with duplicate keys is still valid JSON.

May I ask what the intention of your observation is? Are you advocating for this matter to be closed without action?

vanowm commented 1 year ago

Well, I do believe this is an invalid issue, because even though duplicate keys are technically allowed, unless there is a json parser that allows retrieving data from duplicate keys, I don't see any practical reason to support such edge case.

Just try vscode built-in JSON: Sort Document command, it doesn't remove duplicate keys, however parsed JSON will have different result than the original because it might change order of duplicate keys (I'd rather consider that as corruption):

{
    "1": 1,
    "1": 3,
    "2": 2,
    "4": 4
}

Lx commented 1 year ago

The practical reasons in my view are:

This is a tool primarily intended for use by a programmer (or someone with skills beyond a general end user).
Programmers (and similar) are probably using this tool as part of problem diagnosis/correction. (I doubt people are often sorting JSON just for fun, or without the intention of further analysis.)
In other words, this tool is getting used in cases where data can be expected to be non-conformant.
In my case (and seemingly @guyisra's), this is exactly why we were trying to use this tool.

The built-in JSON: Sort Document command wasn't available when this issue was opened. Now that it exists though, perhaps this entire extension can be considered obsolete.

richie5um commented 1 year ago

Hey all. Author here…

The original intention of this extension was for checked in JSON files - we were using them for localisation. In their original form they had ‘random’ order, and it meant doing PRs was very time consuming. Sorting the output meant PRs became super simple.

I now mostly use it for checked-in JSON (now increasingly containing JSON comments) and quickly making JSON output from APIs more readable for diagnostics.

In both scenarios, having unreachable/unpredictable JSON keys (due to duplicates and order) is not helpful, IMO.

I haven’t yet used the built in approach due to muscle memory.

The source for this extension is available if you want to fork and create one that handles duplicates - and I’ll happily answer questions if the code isn’t clear. That isn’t a passive aggressive reply :), more just to be clear that I don’t (currently) have the need/time to look into this issue further.

HTH.

:-)

vanowm commented 1 year ago

Now I'm just curious, how do you guys create a json file with duplicate keys in the first place? Is it simple appending-to-string method and then save the string as a file?

richie5um / vscode-sort-json

duplicate keys are lost #34