jbeder / yaml-cpp

A YAML parser and emitter in C++
MIT License
5.15k stars 1.85k forks source link

Missing YAML::Node equality operator #274

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
YAML::Load("{[0o13]:{a:0, b:1}").equals(YAML::Load("{[0xB]:{b:1, a:0}"))

What is the expected output? What do you see instead?
true. compilation error.

What version of the product are you using? On what operating system?
0.5.1

Please provide any additional information below.
YAML 1.2 recognizes identity and equality as separate concepts as per 
http://www.yaml.org/spec/1.2/spec.html#id2764652

yaml-cpp currently implements identity using Node::is, but is missing an 
equality checking operator for use in deep content-based comparisons.

Original issue reported on code.google.com by iridian....@googlemail.com on 3 Feb 2015 at 7:28

GoogleCodeExporter commented 9 years ago
Err, my apologies, this isn't a defect but enchancement request.

Original comment by iridian....@googlemail.com on 3 Feb 2015 at 7:31

GoogleCodeExporter commented 9 years ago
See Issue 60. Equality depends on schema, which is not one of yaml-cpp's 
better-defined concepts.

Original comment by jbe...@gmail.com on 3 Feb 2015 at 2:08

kunaltyagi commented 8 years ago

Hi I've been using this function (Link) for a while to check equality of 2 Nodes. I haven't extensively tested it, but it works fine on simple YAML files.

Is this a step in the right direction? If yes, can you point me to some YAML files for extensive testing?

jbeder commented 8 years ago

@kunaltyagi there are a few issues with your link.

1) It only checks maps with string keys (maps are allowed to have arbitrary keys) 2) It doesn't compare tags 3) See #60 for discussion of equality in YAML. You're testing exact comparisons, but YAML allows an application to consider two nodes equal based on application-specific tags, with some being automatically applied for all YAML files. For example

are considered equal.

kunaltyagi commented 8 years ago

@jbeder [Edit] Updated the gist to compare values first typecasted as bool, then int then double and finally std::string.

  1. I tested it with integer, boolean and floating point number in YAML::Scalar Nodes. It converts them to string and checks for equality. In the end, the data has to be one of them, right? From reading the code, YAML converts all scalars to std::string to store them, right? This is just like a byte-by-byte comparison (of their translated version).
  2. [Edit] Updated. Simple std::string comparison of the tags implemented.
    • 10, 0xa, 012, 0b1010: [Edit] Only equality with 0b1010 fails. Rest 3 are equivalent. This might be resolvable by modifying the conversion to-and-from string functions.
    • As far as equality of 10, 10.0, "10" go, they should all test different (currently, they test same. Because YAML::Node returns 10.0 as 10 (if input from c++) or 10.0 (if input from file)). I don't know how to resolve this.

As far as I could understand the code, all scalar values are stored in std::string m_scalar (node_data.h:108). How do I get the type of data (bool, int, double, float, string, etc.) from it? I don't think that is possible.

jbeder commented 8 years ago

Tags: http://www.yaml.org/spec/1.2/spec.html#id2764295

kunaltyagi commented 8 years ago

So, I went ahead and added a check for equality of tags also. However, as i went through the specification, I realized that yaml-cpp has the Map Keys also as a node. So, should even the keys be checked for equality? Currently, I'm converting the key to std::string and checking if such key exists in the second node. I also added a check for identity (since identity implies equality)

jbeder commented 8 years ago

Yes, keys should be checked for equality.

kunaltyagi commented 8 years ago

This is difficult for 2 reasons:

  1. No link to Key from Value node
  2. Maps aren't ordered by Key value. So, iterating on 2 maps doesn't ensure that the respective iterators are pointing to the same item.[1] My current approach to convert Key to std::string is 'wrong' as per YAML 1.2, but due to above 2 reasons the only method to compare 2 maps with n keys would require O(n^2) time, not linear. I will post the O(n^2) version also, but it is bad, just saying. [Edit]: Updated Gist

[1]: I created 2 nodes from a file, added different values in a new map, and got the first and seconds iterators of both maps different despite the keys being the same since the nodes are organised by std::lesser<node*> and node std::lesser<node>. Will a function like bool compare(Node, Node) but substituting < for == be enough in the map declaration as the comparator?

kunaltyagi commented 8 years ago

After looking at node_data::get and node::equals functions, I think that the compare function can be made linear in time IFF the == operator is changed from identity to equality.

Also I'm facing a rather curious problem. The following snippet outputs nothing. Does this stem from the fact that memories are merged in operator[] or that == is identity and returned false in the get function?

YAML::Node node1 = YAML::LoadFile('test.yaml');
YAML::Node node2 = YAML::LoadFile('test.yaml');
std::cout << node2[node1.begin()->first];
jbeder commented 8 years ago

@kunaltyagi I'm not sure what you mean by changing operator == from identity to equality (since that seems what you're trying to compute).

Also: nodes from different calls to Load or LoadFile are unrelated.

jleben commented 5 years ago

@kunaltyagi Even if maps aren't ordered by keys, I see a simple O(N) implementation:

  1. for each entry in one map, insert values into a temporary hash with O(1) access by key.
  2. for each entry in the other map, access the first map's value with the same key from the temporary hash, compare them, and remove the entry from the hash.
  3. If the key from the second map is not in the hash, the maps are not equal. If the compared values don't match, the maps are not equal. If the hash is not empty after comparison, the maps re not equal.
kunaltyagi commented 5 years ago

@jleben I have a personal issue against allocating memory (especially since hashmaps require particularly large pool of memory) for operator ==, but for the sake of argument, let's try.

  1. Using std::unordered_map::insert, it'd take O(N) in best or in worst case O(N*N + N)) to insert N items in a set (or to check and insert). Assume LHS keys are now inserted.
  2. For RHS, iterate over every key and remove the ones found. According to std::unordered_map::erase and std::unordered_map::find, it'll take O(N) to O(N*N) to check and remove.
  3. If any value is extra in RHS or if size is non-zero, LHS != RHS, else LHS == RHS

Concerns: