Closed dsuket closed 8 years ago
If the main content is wrapped in unlikelyCandidates class or id, that was deleted. For example in http://uxmilk.jp/43386, the main content is wrapped in #menu-wrap element. So, check the unlikelyCandidates with p elements. please tell me a better way how get the contents above site.
@dsuket
This has also been a problem for me. But I still think the general idea behind removing unlikely candidates is good (if something just has a class of ad
and nothing else, it's unlikely to have content).
My approach for fixing these problems has been to update the okMaybeItsACandidate
regex. When the node matches this, it will not be removed.
I've done this a few times and those changes have been since pulled in. I've added an integration test to verify this fixes UX-Milk, and that seems to be passing.
So that page should be fixed and this change shouldn't be needed :).
Does that fix your issue @dsuket ? You can test with this commit to verify - https://github.com/luin/readability/pull/77
awesome! thank you @haroldtreen !
Thank you for the pull request! Could you provide more details about this change?