luin / readability

📚 Turn any web page into a clean view
2.49k stars 312 forks source link

article.content only contains a part of the content #100

Open westlinkin opened 7 years ago

westlinkin commented 7 years ago

First of all, great great library! You've done a wonderful job here.

When use this url, the result is wrong. The article.content only contains a part of the content, here the value:

<div class="field field-paragraph field-paragraph--full field-type-text-long field-type-text-long--full"><p>“It was the saddest movie I've ever filmed, to be honest with you. I've never had a more difficult film to film,” Olmos lamented. “It was too close to the time when she was actually killed, it was only 13 months after when we were filming. Nobody wanted to film it, the parents didn't, we didn't, nobody wanted to. We'd rather she be alive. But we had to."</p></div>

If you click on the link, you'll see article.content only contains the first paragraph.

wong2 commented 7 years ago

yes this is one of the biggest limitations of this lib: it doesn't work well on deeply nested HTML structure:

image

raju1988 commented 5 years ago

Yes we are also facing same kind of issue. It won't return full content of html. It returns some random div from the page. I used this link here