willemdj / erlsom

XML parser for Erlang
GNU Lesser General Public License v3.0
267 stars 103 forks source link

simple_form does not handle multiple root tags correctly #24

Closed sedrik closed 10 years ago

sedrik commented 10 years ago

There seems to be a bug in erlsom when you have an errornous xml with multiple root tags.

erlsom:simple_form("<donald>duck</donald><uncle>scrooge</uncle>").
{ok,{"donald",[],[{"uncle",[],["scrooge"]},"duck"]},[]}

I would have expected

{ok,{"donald",[],["duck"]},"<uncle>scrooge</uncle>"}
sedrik commented 10 years ago

I did some more digging and this seems to be a bugg in the sax parser.

erlsom:simple_form("<donald>duck</donald><uncle>scrooge</uncle>").
Event: startDocument
State: {sState,[],#Fun<erlsom_simple_form.0.72756581>,undefined}
Event: {startElement,[],"donald",[],[]}
State: {sState,[],#Fun<erlsom_simple_form.0.72756581>,undefined}
Event: {characters,"duck"}
State: {sState,[{"donald",[],[]}],
               #Fun<erlsom_simple_form.0.72756581>,undefined}
Event: {endElement,[],"donald",[]}
State: {sState,[{"donald",[],["duck"]}],
               #Fun<erlsom_simple_form.0.72756581>,undefined}

%% I would expect endDocument here

Event: {startElement,[],"uncle",[],[]}
State: {sState,[{"donald",[],["duck"]}],
               #Fun<erlsom_simple_form.0.72756581>,undefined}
Event: {characters,"scrooge"}
State: {sState,[{"uncle",[],[]},{"donald",[],["duck"]}],
               #Fun<erlsom_simple_form.0.72756581>,undefined}
Event: {endElement,[],"uncle",[]}
State: {sState,[{"uncle",[],["scrooge"]},{"donald",[],["duck"]}],
               #Fun<erlsom_simple_form.0.72756581>,undefined}
Event: endDocument
State: {sState,[{"donald",[],[{"uncle",[],["scrooge"]},"duck"]}],
               #Fun<erlsom_simple_form.0.72756581>,undefined}
{ok,{"donald",[],[{"uncle",[],["scrooge"]},"duck"]},[]}

Here I would expect the endDocument event to come when we have reached the end tag of the root element donald.

willemdj commented 10 years ago

Solved, see issue 25