Access to content of aliased nodes

GoogleCodeExporter commented 9 years ago

Section 3.2.2.2 of the YAML 1.2 specification states, "In the
representation graph, a node may appear in more than one collection. When
serializing such data, the first occurrence of the node is identified by an
anchor. Each subsequent occurrence is serialized as an alias node which
refers back to this anchor."

Thus, when walking the representation, I think that the client code should
be able to access the value of the aliased node just as if that node were
present instead of the alias, but the client code should also be able to
detect the fact that the YAML::Node object actually represents the same
YAML node as another YAML::Node object and be able to identify the
corresponding YAML node (in case the client code maps some native object to
the YAML node).

For an example of the situation in question:
1. Create the following YAML document:

---
- &a "data"
- *a
...
2. Parse the document.
3. The second element of the sequence has no value and no way to determine
that it is linked to the first element.

What I suggest is:
- Allow access to the content of the node (The string "data" in this case)
through the alias as if the node were repeated.
- Add an IsAlias() method to the YAML::Node class that will allow the
client code (if it is interested) to determine that it is handling an alias.
- Add an Identity() method to the YAML::Node class that returns a unique
identity value for the content, allowing the client code to fully utilize
the node identity for mapping of native objects.

I have attached a patchfile for for the codebase
http://yaml-cpp.googlecode.com/svn/trunk@99
that implements the above changes.

Original issue reported on code.google.com by rtweeks21 on 4 May 2009 at 4:53

Attachments:

alias-access.patch

GoogleCodeExporter commented 9 years ago

Good idea! I've read through your patch, and I need to think about it a little 
more. I'm not sure if I want to 
provide access to the content of a node - perhaps instead we should give access 
to the actual node that an alias 
references.

Original comment by jbe...@gmail.com on 8 May 2009 at 6:10

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I considered that, but after reading through the YAML spec (chapter 3) I came 
to the
conclusion that it would be best to do it as in the patch.  The spec says that
anchors and aliases are serialization details, which means that they (ideally) 
should
not appear in the representation level.  There is a difficulty introduced here
because of a conceptual mismatch between the YAML::Node class and the YAML node
concept of the specification: because the YAML::Node object as written is 
responsible
for parsing the alias from the stream, there already exists a YAML::Node when 
the
alias is read where the specification envisions that the same object would 
appear
both where anchored and where aliased.  Of course actually using the same object
leads to a need for mark-and-sweep cleanup, which is a major bother to code.  I
preferred the Alias content object which explicitly does not delete the other 
content
it references (and leaves cleanup to the YAML::Node that instantiated the 
object) and
explicit reporting of the "identity" of the node, which corresponds to the 
Content
object that is actually providing the content results.

I also looked at how other languages handle the situation.  In Ruby, for 
example, the
document I gave in the original posting would be loaded as an Array consisting 
of two
references to the same string object (not just the same data), i.e the 
Object#equal?
(i.e. the identity test) method will return true when used to compare the array 
elements.

The one concession that I made was that the representation lets the client code 
know
(if it cares) when it is looking at a (YAML) node that occurred earlier in the
representation (through the YAML::Node::IsAlias() method).  This is just a
performance hack, since the client code could also track it through the
YAML::Node::Identity() return values.  It might also be a good idea to add a
YAML::Node::IsReferenced() method that indicates whether a node is referenced
multiple times in the document; with this addition, client code would not have 
to map
the identity from every YAML::Node to its native counterpart -- only the ones 
that
will be referenced again in the future.

I've done some further testing on my patch and found a few bugs.  I'll get a new
patch up when I can (probably with the IsReferenced() method added).

Original comment by rtweeks21 on 8 May 2009 at 7:01

GoogleCodeExporter commented 9 years ago

Here is my latest patch (still against 
http://yaml-cpp.googlecode.com/svn/trunk@99).
 It implements YAML::Node::IsReferenced() for the convenience of client code as I
discussed above.  It also includes bug fixes and a test scenario.

Original comment by rtweeks21 on 8 May 2009 at 10:56

Attachments:

alias-access.patch

GoogleCodeExporter commented 9 years ago

Ok, I understand what you mean. I reread what I wrote and I realized I didn't 
say something I was thinking 
about. What I'm envisioning is that the client doesn't need to know that it's 
reading an alias (just as you 
have), but I'm thinking of implementing it by keeping track of the anchor node 
instead of it's content. I really 
see content as an implementation detail, and I don't want clients to have 
access to it.

Original comment by jbe...@gmail.com on 9 May 2009 at 12:09

GoogleCodeExporter commented 9 years ago

I'm sorry I was unclear -- I also was trying to keep the YAML::Content as an
implementation detail.  The only places the YAML::Content of the aliased node is
referenced is in the YAML::Node::Parse() function, which is already dealing with
YAML::Content objects, and the implementation of YAML::Alias which is, itself, a
YAML::Content subclass.  In my patch, the alias mapping is stored as an
std::map<std::string, YAML::Node*> in the YAML::Scanner, so the YAML::Content 
objects
are never revealed outside the YAML::Node class.

When I said, "Allow access to the content of the node," in my original posting, 
I was
referring to the conceptual content of the node as given in the YAML spec, not 
the
YAML::Content object referred to by the YAML::Node class.  I should have been 
more
careful with my wording.

Original comment by rtweeks21 on 9 May 2009 at 2:30

GoogleCodeExporter commented 9 years ago

Sorry, I guess that the address of a YAML::Content object is also returned from 
the
YAML::Node::Identity() function, but implicitly cast to a "const void *" which I
intended to be an opaque value.  The fact that the returned pointer happens to
reference a YAML::Content object is still an implementation detail; I can think 
of a
couple of ways to change the return value to the address of the YAML::Node 
object in
the anchor position, but only at the cost of an additional pointer.

Original comment by rtweeks21 on 9 May 2009 at 2:51

GoogleCodeExporter commented 9 years ago

OK, I add a branch at branches/aliases, and I committed your changes with a few 
modifications. Mainly, I 
changed the Node::Identity() function to return a pointer to the Node identity 
- either 'this' (for a typical node) or 
the reference node (for an alias). It doesn't semantically change your intent 
(I think) - but let me know what you 
think about that.

Other than that, it looks good, but I want to wait a few days to a week to let 
me test a little more before I merge 
into the trunk. If you see anything else you want to change, feel free to send 
a patch (diff from the alias branch). 
Thanks for your hard work!

Original comment by jbe...@gmail.com on 10 May 2009 at 6:22

Changed state: Started

GoogleCodeExporter commented 9 years ago

That looks good, only the .vcproj file was not updated.  I'm sure that CMake 
can take
care of that, but here's a patch anyway just to make it easier for those who 
only
have Visual Studio.

Original comment by rtweeks21 on 12 May 2009 at 2:50

Attachments:

aliases.patch

GoogleCodeExporter commented 9 years ago

Merged into the trunk.

Original comment by jbe...@gmail.com on 22 May 2009 at 11:13

Changed state: Fixed

majintao0131 / yaml-cpp

Access to content of aliased nodes #12