jpsim / Yams

A Sweet and Swifty YAML parser.
https://jpsim.com/Yams
MIT License
1.11k stars 141 forks source link

Fail on duplicate keys in a mapping #416

Open tejassharma96 opened 5 months ago

tejassharma96 commented 5 months ago

This fixes #415 but it is a slightly "naive" implementation. It's possible that the "correct" place to throw an error from would be the C code rather than the swift code, but I do not have the knowledge to make those changes 😓

Please let me know what you think!

jpsim commented 3 months ago

Thanks for the PR!

This is a welcome change, and doing this at the Swift layer is fine. Based on some GitHub discussions, it does look like the C API can detect duplicate keys. There's also this issue with relevant discussion: https://github.com/yaml/pyyaml/issues/41

However, I think it's important that if we start to error in these cases, that 1) it be recoverable (e.g. we don't fail the whole decoding) and 2) the error is reported at the correct location (mark), right now 1:1 is being hardcoded as the location.

If you rebase or merge main you should have CI in a healthier state to run again.

tejassharma96 commented 3 months ago

hmmm ok, I'll need a bit of support on both of those due to my lack of familiarity here:

  1. Recoverable - what would you suggest here? I personally would actually expect the entire decoding to fail if we have invalid YAML. I'm curious what you're thinking regarding how we can both make the error clear and visible to users but also not fail the whole decoding.
  2. Correct location - yeah, I was wondering what to do here, couple questions:
    • Say there are 3 instances of a specific key, at which one of those keys should we error? Can we even decide that? I suppose we could use key.mark and compare line/column numbers to choose either first/last in the data?
    • What if there are multiple instances of duplicated keys? If keys a, b and c are duplicated, currently my code would basically just error on the first one, but the error would say but found multiple instances of: ["a", "b", "c"]. I think there's a balance to be struck here on being specific about the point of error but also giving the consumer actionable feedback on all the issues. I suppose this could potentially be an opportunity to create a new case on YamlError, something like .duplicatedKeys(keys: [Node]) that could display all the relevant information from the nodes (ie by fetching .mark for all duplicated nodes and creating some custom error message that gives immediate feedback about all duplicated keys)?
tejassharma96 commented 3 months ago

Updated the error to notify of all duplicated keys and their locations by creating a new custom error type that can give you actionable feedback for duplicated keys, no matter how many times a key is duplicated/how many keys are duplicates.

I haven't updated it to be recoverable bc personally I don't believe it should recover - imo it should just throw immediately (but happy to make updates to make it recoverable if you give me guidance on how you think that should be done)

tejassharma96 commented 2 months ago

hey @jpsim, apologies for the tag but I'd love another review here when you have a minute 😅