willemdj / erlsom

XML parser for Erlang
GNU Lesser General Public License v3.0
265 stars 103 forks source link

XML parsing fails for ihsmarkit blog rss feed #70

Closed neeraj9 closed 5 years ago

neeraj9 commented 5 years ago

Parsing fails for ihsmarkit blog rss feed.

** exception throw: {error,"Malformed: Illegal character in prolog"}

Steps to reproduce the issue is as follows:

application:ensure_all_started(inets),
application:ensure_all_started(ssl),

Url = "https://ihsmarkit.com/BlogFeed.ashx?i=Technology",
Headers = [], ConnectTimeoutMsec = 2000, TimeoutMsec = 2000,
HttpOptions = [{timeout, TimeoutMsec}, {connect_timeout, ConnectTimeoutMsec}],
{ok, {{_, Code, _}, _Headers, Body}} = httpc:request(get, {Url, Headers}, HttpOptions, [{body_format, binary}]),

{ok, {XmlNode, _XmlAttribute, XmlValue}, _} = erlsom:simple_form(binary_to_list(Body)).

I am using the the following revision of erlsom in my project.

commit 1a9ea1a16ed2cf0466c11608367612d8b05ea6ae
Merge: df4526c b4ef336
Author: Willem de Jong <w.a.de.jong@gmail.com>
Date:   Wed Feb 28 13:36:29 2018 +0100
willemdj commented 5 years ago

Your XML has a byte order mark. When you do_binary_to list(Body) you create a list that starts with the three characters of the byte order mark. That is not what erlsom expects, because the byte order mark only makes sense on an encoded binary.

The easiest solution is to skip the binary_to_list/1 step, it is not necessary.

Good luck, Willem

Op wo 30 jan. 2019 om 03:42 schreef Neeraj notifications@github.com:

Parsing fails for ihsmarkit blog rss feed https://ihsmarkit.com/BlogFeed.ashx?i=Technology.

** exception throw: {error,"Malformed: Illegal character in prolog"}

Steps to reproduce the issue is as follows:

application:ensure_all_started(inets),application:ensure_all_started(ssl), Url = "https://ihsmarkit.com/BlogFeed.ashx?i=Technology",Headers = [], ConnectTimeoutMsec = 2000, TimeoutMsec = 2000,HttpOptions = [{timeout, TimeoutMsec}, {connecttimeout, ConnectTimeoutMsec}], {ok, {{, Code, _}, _Headers, Body}} = httpc:request(get, {Url, Headers}, HttpOptions, [{body_format, binary}]),

{ok, {XmlNode, XmlAttribute, XmlValue}, } = erlsom:simple_form(binary_to_list(Body)).

I am using the the following revision of erlsom in my project.

commit 1a9ea1a16ed2cf0466c11608367612d8b05ea6ae Merge: df4526c b4ef336 Author: Willem de Jong w.a.de.jong@gmail.com Date: Wed Feb 28 13:36:29 2018 +0100

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/willemdj/erlsom/issues/70, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJjx-3B1cYisW711ZrdrZqc9il_JJZbks5vIQaugaJpZM4aZVuE .