Open Simounet opened 3 years ago
I agree it could be a legit content but I guess that in most cases, the text before :
is often the website name.
It's here since the beginning: https://bitbucket.org/fivefilters/php-readability/src/5112edb387b53931ab9324b890fa581c0e951d2d/Readability.php#lines-260
I understand but do you know many sites using this pattern? I don't. If we follow this rule, we should do the same with -
and |
. Sometimes the site name is at the beginning, sometimes at the end. Hard to tell.
I think that we should remove this or at least be able to bypass this condition.
Hi there, I don't get why we are cleaning the title content before the
:
character. It could be legit content. https://github.com/j0k3r/php-readability/blob/9a490fac078b0f773c9848af1c6d76336a073a8d/src/Readability.php#L850