Open tomekowal opened 5 years ago
@tomekowal, Thanks for the issue and interest. I've been a bit busy, but I hope to get to this next week.
I'll make the project "rebar-able" and add some documentation/examples for handling individual nodes and transforming document streams.
@tomekowal , So, before making a bunch of docs that will need to changed. I'd like to see if this makes sense to you:
It's a more "procedural" example for ease of following the flow. Mind you, the API isn't near final, but the parser will always be similar to an iterator, and there will be a writer as well as a reader. So that won't change. Maybe just the names. :-)
run() ->
Input = <<"<tag>\n <subtag>asdf</subtag>\n <subtag>qwer</subtag>\n "
"<subtag>asdf</subtag>\n</tag>">>,
State = stax:stream(Input, [{whitespace, false}]),
% fake it for now until there is a serialization API
OutState = {<<>>, #{}},
% read and assert the startDocument event, write it out
{#{type := startDocument} = E1, State1} = stax:next_event(State),
OutState1 = stax:write_event(E1, OutState),
% read and assert the startElement event for the "tag" tag, write it out
{#{type := startElement,
qname := {<<>>, <<>>, <<"tag">>}} = E2, State2} = stax:next_event(State1),
OutState2 = stax:write_event(E2, OutState1),
{State3, OutState3} = reverse_subtag(State2, OutState2),
{State4, OutState4} = reverse_subtag(State3, OutState3),
{State5, OutState5} = reverse_subtag(State4, OutState4),
% read and assert the endElement event for the "tag" tag, write it out
{#{type := endElement,
qname := {<<>>, <<>>, <<"tag">>}} = E3, State6} = stax:next_event(State5),
OutState6 = stax:write_event(E3, OutState5),
% read and assert the endDocument event, write it out
{#{type := endDocument} = E4, _State7} = stax:next_event(State6),
{Output, _} = stax:write_event(E4, OutState6),
Output.
reverse_subtag(State, OutState) ->
case stax:next_event(State) of
% the 'subtag' opening tag
{#{type := startElement} = E1, State1} ->
OutState1 = stax:write_event(E1, OutState),
reverse_subtag(State1, OutState1);
% the text to change
{#{type := characters,
data := Sub} = E1, State1} ->
OutState1 = stax:write_event(E1#{data := do_flip(Sub)}, OutState),
reverse_subtag(State1, OutState1);
% the 'subtag' closing tag, so return
{#{type := endElement} = E1, State1} ->
OutState1 = stax:write_event(E1, OutState),
{State1, OutState1}
end.
do_flip(Text) ->
Chs = [T || <<T/utf8>> <= Text],
Rev = lists:reverse(Chs),
<< <<C/utf8>> || C <- Rev >>.
Seems clear. I just realised that there is no Enum.reduce
in Erlang, only foldl
and foldr
on lists, so the recursive bits need to be written by hand.
Also, I think you can use string:reverse
because it correctly groups things into grapheme clusters, but still retunrs io data (but that is outside of the discussion :))
Yeah... string:reverse
doh! :-)
Since I have no experience with Elixir, it would be interesting to see what the same example would look like with it. Also is the return type from the stax:next_event
call, with {Event, State}
easy enough, or should that be changed to something else?
Hey, I made an example elixir application that uses yaccety_sax
https://github.com/tomekowal/yaccety_sax_test/blob/master/test/yaccety_sax_test_test.exs
All the exciting stuff is in the test file.
The first test is what you pasted above rewritten in Elixir.
The second one is an example of using Elixir streams and Enum.reduce to work with it.
The third one is again reversing example but using streams.
As you can see, the {Event, State}
is perfect because stream generators expect exactly that format. {CurrentElement, StateToBuildNextElement}
.
Cool! And great that the output format fits so well!
Time permitting, I'll try to finish the rest of the implementation (DTD, default attributes stuff, external references and entities, etc.).
Also documentation. :-)
How was the performance?? low memory footprint? fast enough?
Unfortunately, I didn't test it on anything more significant than that toy example. We don't have that many big XML files, anyway.
For now, we settled on using :xmerl
in our project.
We will watch closely how this repo evolves :)
Hey! There is no documentation and we would like to try it. Our use case is that we want to modify elements based on their contents. In example reverse contents of
tag/subtag