mrosenbladt / yaml-cpp

Automatically exported from code.google.com/p/yaml-cpp
MIT License
0 stars 0 forks source link

Mappings with nonunique keys are allowed #60

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
According to the YAML 1.2 spec, keys in mappings must be unique (read the 
paragraph on mappings):
http://www.yaml.org/spec/1.2/spec.html#id2764044

However, yaml-cpp will parse a mapping with nonunque keys just fine. The 
value returned for the duplicate key would be whatever value was last 
parsed.

For example, let's say you have file test.yaml:

---
a: 1
b: 2
c: 3
a: 4
...

And you have the following C++ source:
#include <fstream>
#include <iostream>
#include "yaml.h"
using namespace std;
int main() {
    ifstream    InFile("test.yaml");
    YAML::Parser    parser(InFile);
    YAML::Node      node;
    YAML::Iterator  it;
    string      key, val;

    parser.GetNextDocument(node);
    for (it = node.begin(); it != node.end(); it++) {
        it.first() >> key;
        it.second() >> val;
        cout << key << ": " << val << endl;
    };
    return 0;
};

Compiling and running the source, you get the following output:
a: 4
b: 2
c: 3

However, if everything was done the yaml-cpp way, maybe an exception 
should be thrown?

I am compiling this with gcc 4.3.2 on Linux 32. I'm using the svn sources 
and the last time I've updated was revision 326.

Original issue reported on code.google.com by y...@yyao.ca on 1 Dec 2009 at 2:41

GoogleCodeExporter commented 9 years ago
Yeah, I've been thinking about this for a while, but I'm not sure what the right
answer is. I've never liked the restriction about multiple keys; nor have I 
liked
YAML's definition of equality. There's actually a little discussion of node 
equality
on the YAML mailing list now, which is pretty interesting, and I'm waiting until
that's cleared up to make a final decision on how yaml-cpp should handle it.

My preference would be to defer the decision about equality to the user of the
library. This is consistent with the way yaml-cpp handles maps: it'll treat it 
as
just a list of key/value pairs, and if you want keys that look the same, that'd 
be
fine - it'll output all the key/value pairs as you gave them.

On the other hand, if you want to find values by key, it'll simply use the 
equality
of whatever key you're using, which can be defined by the user if necessary. If 
you
have multiple keys that are equal to the one you give, it'll just take the first
(then it becomes your problem, not yaml-cpp's).

But as I said, I'm currently waiting to see how the YAML creators decide, and 
then
I'll decide what to do with yaml-cpp. But it's good to have this issue open -
yaml-cpp actually currently leaks memory when you feed it a map with multiple 
keys :)

Original comment by jbe...@gmail.com on 1 Dec 2009 at 3:47

GoogleCodeExporter commented 9 years ago
Thats rather funny. I was about to start valgrinding my code. Now, I'll just be 
a 
bit more careful with it.

Are there any other memory leak issues that you are aware of? I guess we should 
report them as we come across them?

Original comment by y...@yyao.ca on 2 Dec 2009 at 12:46

GoogleCodeExporter commented 9 years ago
No, that's the only one I'm aware of. It's actually easy to fix, so I suppose I 
may
as well fix it. I'm a bit strange in this regard: I didn't want to fix it until 
I've
decided how we're going to handle duplicate keys (and equality in general).

Original comment by jbe...@gmail.com on 2 Dec 2009 at 12:56

GoogleCodeExporter commented 9 years ago
OK, that leak is fixed (r328). I also just ran valgrind on the test suite, and 
it
found no leaks.

Original comment by jbe...@gmail.com on 2 Dec 2009 at 1:30

GoogleCodeExporter commented 9 years ago
I couldn't compile this (r328). Shouldn't #include <memory> be somewhere in 
map.h 
or its headers? Its used for auto_ptr.

Thanks.

Original comment by y...@yyao.ca on 2 Dec 2009 at 4:47

GoogleCodeExporter commented 9 years ago
Sorry, you're right. Try r329. I don't know why it let me compile this; I 
suppose
different implementations *may* include certain headers in other ones that 
aren't
required.

Original comment by jbe...@gmail.com on 2 Dec 2009 at 6:00

GoogleCodeExporter commented 9 years ago
I see a different behavior when using the new API. 
Firstly, when iterating yaml-cpp loops over both keys.

Secondly, when I use the [] operator to get the value for a given key, it 
returns the first parsed value and not the last one.

Original comment by saiyamko...@gmail.com on 12 May 2012 at 12:25

GoogleCodeExporter commented 9 years ago
@saiyamkohli, yes, you're right, that's a byproduct of the way nodes are stored 
in the new API.

(Please don't depend on this behavior! If you use identical keys, it's illegal 
according to the spec, so anything may happen here in the future.)

Original comment by jbe...@gmail.com on 12 May 2012 at 12:42

GoogleCodeExporter commented 9 years ago
Jesse, Are you planning to update the new API to resolve this anytime soon? If 
not, I can go over the code  and implement changes to handle this. Please let 
me know what software processes you use for accepting patches.

Original comment by saiyamko...@gmail.com on 14 May 2012 at 5:17

GoogleCodeExporter commented 9 years ago
@saiyamkohli, do you mean handling nonunique keys in general, or modifying the 
new API behavior to match the old one?

If you mean the former, then I'd be happy to accept a patch, but I haven't 
figured out a solution yet to even implement (let alone worrying about 
implementing it). So I'd be very pleased if you came up with a clever solution!

If you mean the latter, I see no reason to match the old API's behavior, since 
it's not documented, and neither behavior is "correct".

And finally, any diff tool should work - ideally, whatever you get as output 
when you type "hg diff" would be easy to patch.

Original comment by jbe...@gmail.com on 14 May 2012 at 5:31

GoogleCodeExporter commented 9 years ago
I was planning on handling non-unique keys in general, I agree with you that it 
should be flexible and it should let the user of the library decide what to do 
with non-unique keys.

I thought of either using policy classes as template parameters or use policy 
functors to handle this. By default yaml-cpp would use the base policy which is 
to throw an exception if it encounters non-unique keys (as per the spec) and 
then I can also implement alternate policies for overriding the previous key or 
keeping them around as the current implementation does. Either way, users will 
be able to specify their own policy for how to parse non-unique keys.

I haven't had a chance to go over the yaml-cpp codebase so I am not sure how 
and where this will fit in but I am going to implement this for my use-case 
anyway and since I am planning to use yaml-cpp to parse yaml, it makes more 
sense to try and get the patch included in official release.

Do you think what I proposed would work? 

Original comment by saiyamko...@gmail.com on 14 May 2012 at 5:59

GoogleCodeExporter commented 9 years ago
It's possible - the main problem, as I see, is that uniqueness depends on your 
definition of equality. For example, according to the spec,

0o13: first
0xB: second

is illegal, since both keys are the integer 11. So there would have to be a 
policy that determines when two nodes are equal.

But this shouldn't deter you! I'm excited to see what you come up with!

Original comment by jbe...@gmail.com on 14 May 2012 at 7:44