Open Feelnoobskill opened 8 years ago
What do you mean by this?
When you use readability, the article returned will be null if nothing was found. Otherwise it will return whatever part of the article it determined to be the main content.
I've been wondering how to flag articles that aren't being extracted properly (eg. a description has been extracted rather then the article). My current approach is to look at the size of the article html vs. input html. I've found that if the content html is < 3% of the original article - chances are the main article was missed.
Does that help @Feelnoobskill ? Or maybe you can elaborate what you picture the solution looking like?
@haroldtreen thanks for the response. Basically, I would like to create reader mode like iOS Safari has.
Meaning that some pages are not suitable for opening in reader mode (for example stackoverflow home page). Right now node-readability
will extract some random text from webpage and this is not acceptable in my case . So i was thinking maybe someone already faced with this problem and can share their experience.
Ah. Interesting. I wasn't aware that iOS did that.
Some ideas:
<meta property="og:type" content="article">
The tag stuff might be the closest stuff to being able to say yes/no without actually running the algorithms on the page.
This is good info @haroldtreen.
It would be great if the library had an API for this (eg isReadable
)
I want to check if the page is readable or not. Is that possible?