Closed dLuna closed 12 years ago
Hi,
I don't have much time right now, but I'll look into it.
Can you explain what you mean by "CDATA which was not flush with its surrounding tags"? Or provide an example?
Regards, Willem
The following xsd and xml files will work if you put the <![CDATA
flush with <whatnot>
but not the way it is in the example below.
Save these examples as example.xsd
and example.xml
and run erlsom:scan(element(2, file:read_file("example2.xml")), element(2, erlsom:compile_xsd_file("example2.xsd")), [{output_encoding, utf8}]).
and you will get the following crash. Remove [{output_encoding, utf8}]
and it works. It is fully possible that the bug is in erlsom_sax_utf8.erl
instead. There is a comment on line 862 that sort of makes me suspect that is the case. I don't understand the code base well enough to solve it that way.
** exception throw: {'EXIT',
{error,
[{exception,
{badarg,
[{erlang,'++',[<<"\n">>,<<"Testing">>],[]},
{lists,append,2,[{file,"lists.erl"},{line,63}]},
{erlsom_parse,stateMachine,2,
[{file,"src/erlsom_parse.erl"},{line,652}]},
{erlsom_parse,xml2StructCallback,2,
[{file,"src/erlsom_parse.erl"},{line,299}]},
{erlsom_sax_utf8,wrapCallback,2,
[{file,"src/erlsom_sax_utf8.erl"},{line,1364}]},
{erlsom_sax_utf8,parseContentLT,2,
[{file,"src/erlsom_sax_utf8.erl"},{line,864}]},
{erlsom_sax_utf8,parse,2,
[{file,"src/erlsom_sax_utf8.erl"},{line,196}]},
{erlsom,scan2,3,
[{file,"src/erlsom.erl"},{line,211}]}]}},
{stack,[{'#PCDATA',char,<<"\n">>},'top-type']},
{received,{characters,<<"Testing">>}}]}}
in function erlsom:scan2/3 (src/erlsom.erl, line 215)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="whatnot-type">
<xs:restriction base="xs:string" />
</xs:simpleType>
<xs:complexType name="top-type">
<xs:all>
<xs:element name="whatnot" type="whatnot-type"></xs:element>
</xs:all>
</xs:complexType>
<xs:element name="top" type="top-type" />
</xs:schema>
<top><whatnot>
<![CDATA[Testing]]></whatnot></top>
Thanks, I merged it to master.
When using utf8 output_encoding, the old code would crash for a CDATA which was not flush with its surrounding tags.
There are numerous more ++ in the code of this module and I don't know enough to be able to reliably know whether some of those should also be replaced with a version that works on both binary and lists.
Feedback and comments very welcome.