josdejong / lossless-json

Parse JSON without risk of losing numeric information
MIT License
411 stars 28 forks source link

Duplicate keys #244

Open verhovsky opened 1 year ago

verhovsky commented 1 year ago
import { parse } from 'lossless-json'
const text = '{"key":"first","key":"second"}'
parse(text)

produces this error message:

SyntaxError: Duplicate key 'key' encountered at position 16

but I would like to parse repeated keys and iterate over them instead of silently discarding them (like the standard JSON.parse()) or throwing an error (like lossless-json). I think you would need another DuplicatedKeys object which would just be a list of lists with two elements.

Please don't hesitate to close this issue though, I realize this is a pretty dumb thing to want to do.

josdejong commented 1 year ago

I realize this is a pretty dumb thing to want to do.

Please don't say that. I suppose it came as an idea from an actual problem you're trying to solve.

Some thoughts:

  1. In general, I think it is best to throw an error on duplicate keys rather then silently throwing away all but the last key. In such a case, something is wrong and probably should be solved at the origin.
  2. I can imagine it would be useful to customize the behavior when encountering a duplicate key, and in specific cases choose to ignore the duplicate key. Something like an onDuplicateKey callback, which by default throws an error but can be overwritten. If there is enough need and valid use cases for this we can decide whether we want to add such a callback. Any concrete use cases to share here?
verhovsky commented 1 year ago

Fair enough.

I'm working on a dev tool that converts JSON to a Python dictionary/list/etc. You can see it in action by going to https://curlconverter.com/ and typing in a command like

curl example.com --json '{"key": [1, 2, 3], "null key": null, "duplicatekey": "a", "duplicatekey": "b"}'

If there's a duplicate key, preserving it in the generated Python and warning users about it (I have a mechanism for surfacing warnings) would be more helpful than removing it, because you can notice the issue by looking only at the (nicely formatted!) output, which is less work than having to compare the generated Python with the input JSON.

Users submit JSON to my tool that they didn't write, so throwing an error and telling them their input is weird would be a worse user experience than just removing it because it's not their fault and it's not their job to fix it. I want my dev tool to do the most immediately helpful thing.

josdejong commented 1 year ago

Nice tool 😎

Your example makes sense indeed, presenting it as a conversion warning rather than a throwing an error.

Ok let's implement this 👍. I think we could go in a few directions qua API:

  1. Add this as a fourth, optional parameter to parse:

    parse(
      text: string,
      reviver?: Reviver,
      parseNumber?: NumberParser = parseLosslessNumber,
      onDuplicateKey?: (key: string, position: number) => void
    )
  2. Expose this as a global property on parse, like:

    parse.onDuplicateKey = (key: string, position: number) => void
  3. Change the API to have a more generic third parameter that can be extended:

    parse(
      text: string,
      reviver?: Reviver,
      options?: {
        parseNumber?: NumberParser = parseLosslessNumber,
        onDuplicateKey?: (key: string, position: number) => void,
        // ...
      }
    )

Option (1) looks ugly and not future-proof. I like the simplicity of option (2) but dislike overriding things globally (of course you can restore the original parse.onDuplicateKey after changing and using it). Option (3) looks most neat in the long term, but a bit overkill at this moment, and it would be a breaking change (though you can make it backward compatible). I think we should go for option (3) as soon as a third option is introduced, and that for now option (2) is a good approach, given that I expect that it is an edge case wanting to customize onDuplicateKey.

Any thoughts?

josdejong commented 1 year ago

Hm, maybe I like option (3) better anyway. Let me give it some more thought.

verhovsky commented 1 year ago

If we don't care about backwards compatibility, I agree something like 3 is what I would like. I would prefer to not have to define a function, so something like this:

parse(
  text: string,
  reviver?: Reviver,
  options?: {
    parseNumber?: NumberParser = parseLosslessNumber,
    duplicateKeys?: "error" | "ignore" | "keep" | (key: string, position: number) => void = "error",
    // ...
  }
)

The the way I need to use this is similar to LosslessNumber. I iterate over the JSON recursively, if I ever hit a DuplicateKeys (maybe a better name?) object, I iterate over the key/values and I need to know whether or not the key I'm currently on is a duplicated key and ideally whether or not it's the last duplicate (the one that takes precedence).

You can see where I want to use it in the code in this PR

https://github.com/curlconverter/curlconverter/pull/470

https://github.com/curlconverter/curlconverter/pull/470/files#diff-c389959332ed68018d95812c4a520d7f69f146430132584f71b9ce8de45691d3

P.S. what's the correct way to check for LosslessNumber? This can't be right. I was disappointed that LosslessNumber instance of Number is false, also I get weird type errors from lossless-json, I had to add an any (d is a string):

      const dataAsLosslessJson = losslessJsonParse(d);
      jsonDataString = "json_data = " + objToPython(dataAsLosslessJson as any) + "\n";

if I remove the as any, I get a type error from objToPython saying Argument of type 'unknown' is not assignable to parameter of type 'string | number | boolean | object | null'. ts(2345). Why does it think dataAsLosslessJson is unknown?

josdejong commented 1 year ago

I like your idea of defining duplicateKeys?: "error" | "ignore" | "keep" | function. I think we can let the callback function also return one of these three values instead of void, that would make it more flexible.

Let's discuss the typing things separately in #245 to keep this discussion focused.

ptth222 commented 3 months ago

I stumbled onto this from a similar issue in Python. You might consider how they do it in this SO post. The second answer in the post creates a list/array instead of just raising an exception.

josdejong commented 3 months ago

Thanks for your suggestion. I understand that this solution puts the duplicate values into an array. Silently altering the data structure doesn't sound like a good idea to me.