j0k3r / php-readability

A fork of https://bitbucket.org/fivefilters/php-readability
Apache License 2.0
168 stars 36 forks source link

Fix hasSingleTagInsideElement method #88

Closed jtojnar closed 7 months ago

jtojnar commented 7 months ago

It would fail for e.g. <div> <p>foo</p> </div>.

mozilla/readability uses children for the tag lookup, which return only elements. PHP does not have children property so b580cf216d9001f82c866bb9a6c8bcad1cc862d8 mistakenly used childNodes instead, but that can return any node type.

Let’s filter the children ourselves.

Also add comments from mozilla/readability’s _hasSingleTagInsideElement.

Picked from https://github.com/j0k3r/php-readability/pull/87