Closed GoogleCodeExporter closed 9 years ago
Due to anchors-aliases the whole document must be constructed at once.
Who shall decide how to split the document ? SnakeYAML or the user ?
Can you please provide an example ?
Original comment by aso...@gmail.com
on 13 Nov 2009 at 5:41
> Due to anchors-aliases the whole document must be constructed at once.
No. The spec garantees that the anchors come before the aliases. The
implementation
would need to keep track of the anchors it has seen so far and keep a reference
to
the corresponding object.
> Who shall decide how to split the document ? SnakeYAML or the user ?
What do you mean by split?
> Can you please provide an example ?
I don't have a clear idea how this should look like. I just don't like the idea
of
loading large documents into memory just to iterate over them.
In my specific situation the root node is a huge sequence (~10^7 elements). The
elements of the sequence are small.
There's one idea that would work for this. If it would be possible to use
events and
composer/constructor together. My code could then look something like this:
parser.getEvent() // StreamStartEvent
parser.getEvent() // DocumentStartEvent
parser.getEvent() // SequenceStartEvent
while(!parser.checkEvent(SequenceEndEvent)){
Object obj = constuctor.getData();
// do something with obj
}
parser.getEvent() // SequenceEndEvent
parser.getEvent() // DocumentEndEvent
parser.getEvent() // StreamEndEvent
Here constuctor.getData() would call the composer which in turn would read all
the
events from parser that belong to the next node.
This would work nicely for me. But it's quite specific for my problem. I was
hoping
for a solution that would be of use to a bigger audience.
Original comment by smurn....@gmail.com
on 13 Nov 2009 at 8:05
>The implementation would need to keep track of the anchors it has seen so far
and
keep a reference to the corresponding object.
If you create and keep the objects then you consume the same resources as with
the
complete construction.
> What do you mean by split?
I expected you wish to cut the YAML document into pieces to create them one by
one
>In my specific situation the root node is a huge sequence (~10^7 elements). The
elements of the sequence are small.
Then simply create nodes. It is much simpler to use then working with events.
Original comment by py4fun@gmail.com
on 16 Nov 2009 at 9:55
> If you create and keep the objects then you consume the same resources
> as with the complete construction.
No, most files have very few anchors defined.
> Then simply create nodes.
I'm not sure if I understand you correctly, do you propose to use mutliple
documents
in the same file? That's what I'm doing currently, trouble is, that I have to
reference objects across documents, and anchors only work within a document.
There's another problem I haven't seen so far. If there's an option to read
documents
in a stream like, there should also be a way to write them like a stream. But
then
every node needs an anchor because we can never know which node we will see
again. If
we give every node an anchor, stream like reading make no sense no more because
we
need to keep a reference in memory for everything.
This makes processing of huge documents partically impossible anyway.
Also, I had a look at the code. I don't think that what I need could be
implemented
in snakeYAML without a major refactoring. IMHO not worth the trouble.
Suggesting to close the issue.
Original comment by smurn....@gmail.com
on 22 Nov 2009 at 4:13
Original comment by py4fun@gmail.com
on 23 Nov 2009 at 8:47
Original issue reported on code.google.com by
smurn....@gmail.com
on 13 Nov 2009 at 5:05