keepcosmos / readability

Readability is Elixir library for extracting and curating articles.
Apache License 2.0
253 stars 58 forks source link

XML version tag seems to break summarize #10

Closed carbureted closed 8 years ago

carbureted commented 8 years ago
iex(4)> require Readability
Readability

iex(5)> url = "http://www.ncbi.nlm.nih.gov/pubmed/22188593"
"http://www.ncbi.nlm.nih.gov/pubmed/22188593"

iex(6)> Readability.summarize(url)
** (FunctionClauseError) no function clause matching in Readability.Helper.remove_tag/2
    (readability) lib/readability/helper.ex:52: Readability.Helper.remove_tag({"version", "1.0"}, #Function<0.72746186/1 in Readability.ArticleBuilder.build/2>)
    (readability) lib/readability/helper.ex:55: Readability.Helper.remove_tag/2
    (readability) lib/readability/helper.ex:66: Readability.Helper.remove_tag/2
    (readability) lib/readability/helper.ex:55: Readability.Helper.remove_tag/2
    (readability) lib/readability/article_builder.ex:24: Readability.ArticleBuilder.build/2
    (readability) lib/readability.ex:77: Readability.summarize/2

iex(6)> url = "https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58"
"https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58"

iex(7)> Readability.summarize(url)
%Readability.Summary{article_html: "<div><div><p id=\"3476\"><strong><em>Background: </em></strong><em>I’ve spent the past 6 years building web applications in Ruby and the Rails framework. I’ve flirted with new programming languages as they came out, but Elixir is the first language that has been able to captivate me.</em> etc etc etc}

The main difference that I see is the <?xml version="1.0" encoding="utf-8"?> tag at the top of the pubmed link.

I'm pretty new to elixir, but if you have any pointers for fixing this, I'm happy to help!

Thanks.

keepcosmos commented 8 years ago

@carbureted It maybe Floki issue, I will check!

carbureted commented 8 years ago

Any chance you could update hex with the latest version?

Thanks!

keepcosmos commented 8 years ago

@carbureted Updated your version is 0.5.2 👍