Closed aeb-dev closed 2 years ago
Thanks a lot for the example file, this makes it really easy to reproduce the questions!
toXmlNodes()
converts the events to DOM nodes. If the converter encounters a start-event, it has to read until the corresponding end-event to be able to create a complete DOM node that contains all its children.
The text nodes are not empty, they all contain one or more (possibly significant?) whitespaces. You could filter them, if you are only interested in the nodes that contain actual text:
final stream = file.openRead()
.transform(Utf8Decoder())
.toXmlEvents()
.normalizeEvents();
await for (final events in stream) {
events
.whereType<XmlTextEvent>()
.where((event) => event.text.trim().isNotEmpty)
.forEach(print);
}
Sorry if the documentation is unclear (always happy to get pull requests that improve it). The idea with selectSubtreeEvents
is that you get all the events below a node your are interested in to possibly build DOM nodes with toXmlNodes()
. If you are only interested in start and end events you can easily filter everything else away.
It does, but since your predicate matches on the root element <map ...
of your file, no other sub-tree is selected and you get the DOM of your whole root node. If you would select a repeated element deep within your document, you would get multiple subtrees, i.e.
final stream = file.openRead()
.transform(Utf8Decoder())
.toXmlEvents()
.selectSubtreeEvents((node) => node.name == 'tile')
.toXmlNodes();
await for (final events in stream) {
events.forEach(print);
}
Thanks a lot for the example file, this makes it really easy to reproduce the questions!
Happy to hear that. I was afraid to be misunderstood :) Also, thank you very much for fast and detailed answers.
A1 toXmlNodes() converts the events to DOM nodes. If the converter encounters a start-event, it has to read until the corresponding end-event to be able to create a complete DOM node that contains all its children.
I believe streaming api should let me consume the xml, event by event, from start to end, or vice versa. For example with the following code:
final stream = file.openRead().transform(Utf8Decoder()).toXmlEvents().toXmlNodes();
Would not this always produce the root element then? The name toXmlNodes
feels like I will receive every node. Now, I know that if I want to receive everything in a streaming way I should get them from leaf to root. But note that, this approach let's you do whole scan element by element in a single traverse. Would that make sense to you?
Also, the name could be toXmlElements
since XmlNodeType
has a lot of types.
A2 The text nodes are not empty, they all contain one or more (possibly significant?) whitespaces. You could filter them, if you are only interested in the nodes that contain actual text:
What causes them? When I look at the file I do not see anything that should produce that?
A3 Sorry if the documentation is unclear (always happy to get pull requests that improve it). The idea with selectSubtreeEvents is that you get all the events below a node your are interested in to possibly build DOM nodes with toXmlNodes(). If you are only interested in start and end events you can easily filter everything else away.
After playing with it, it clicks. The way I expect it could be flawed, as well. That is just my opinion.
A4 It does, but since your predicate matches on the root element <map ... of your file, no other sub-tree is selected and you get the DOM of your whole root node. If you would select a repeated element deep within your document, you would get multiple subtrees, i.e.
After your answers, this makes sense within the implemented context but I think this relates a lot with the Q1 and the way I expect the streaming to work.
I believe streaming api should let me consume the xml, event by event, from start to end, or vice versa.
Events are a flat sequence of items. In your file this is an XML declaration event, a text event, a start element node, a text event, another start element node, etc.
Nodes are a forest of trees. In your file this is the XML declaration, a text node, an element node (with many other nodes as children), and another text node.
Would not this always produce the root element then? The name
toXmlNodes
feels like I will receive every node. Now, I know that if I want to receive everything in a streaming way I should get them from leaf to root. But note that, this approach let's you do whole scan element by element in a single traverse. Would that make sense to you?
It does produce the 4 root elements of your XML file: a XML declaration, a text node, an element node, and another text node. If you wanted to traverse into the descendants of the nodes you could always do so:
await file.openRead()
.transform(Utf8Decoder())
.toXmlEvents()
.toXmlNodes()
.expand((nodes) => nodes)
.expand((node) => node.descendants)
.forEach(print);
Also, the name could be
toXmlElements
sinceXmlNodeType
has a lot of types.
That wouldn't make sense, because these are not necessarily just elements (see above).
What causes them? When I look at the file I do not see anything that should produce that?
Newlines and indention spaces between the tags.
I understand what you mean but could not figure out how to do the following in a single traverse:
Using the same example file, imagine that map
is a class as TmxMap
and it has tiles so TmxMap
has a field that represents tiles
and so on. Now this map file could be very big so it should not be loaded to memory as whole.
You can check the models here if you like: https://github.com/aeb-dev/tmx_parser/blob/main/lib/src/tmx_map.dart The current version loads everything into the memory, I want to change it to event driven. I think I can make it if I traverse the file multiple times, however how would I do it with a single traverse?
I see two ways to go about this:
selectSubtreeEvents
that splits the stream into two streams), but then I also don't want it to become a generic stream extension library.Thanks for the explanations and tips, let's how it goes.
I have the following xml (Change file type from zip to xml, github does not allow uploading xml files 😞 ):
map.zip
Q1:
I have the following code:
The resulting
Stream
only yields one item withnodeType
XmlNodeType.ELEMENT
which is elementmap
. Should not it also yield other elements insidemap
?Q2:
With the same file and following code:
The resulting
Stream
yields a lot of emptyXmlTextEvent
s is this normal?Q3
This question is more like an opinion. With the same file and following code:
The resulting
Stream
both containXmlTextEvent
,XmlStartElementEvent
andXmlEndElementEvent
. The signature and the name of the function put me under the expectation of not expecting anything other thanXmlStartElementEvent
since the predicate passes onlyXmlStartElementEvent
and filters based on that. I can understandXmlEndElementEvent
assuming everyStart
has anEnd
butXmlTextEvent
feels off.Q4
With the same file and the following code:
Since
selectSubtreeEvents
givesXmlStartElementEvent
andXmlEndElementEvent
I was expecting the resultingStream
to yieldXmlNode
s that are encapsulated by theStart
andEnd
With all these question I feel like I am missing some part of the puzzle, therefore I could not understand it. However, I have been on this for a while and could not grasp it. Feel free to ask more information if some part is not clear. Sorry if it is too long.