j0k3r / php-readability

A fork of https://bitbucket.org/fivefilters/php-readability
Apache License 2.0
168 stars 36 forks source link

Not able to get the full content #70

Closed girishpanchal30 closed 2 years ago

girishpanchal30 commented 2 years ago

Hi there, I tried to get full content using the below page URL but I can't able to do it when the article has shorter content.

I have checked the library code and see the MIN_ARTICLE_LENGTH logical condition. Here - https://github.com/j0k3r/php-readability/blob/master/src/Readability.php#L1311

Page URL: 1) https://thisis50.com/2021/09/10/uncle-murda-ft-eli-fross-so-what-official-video 2) https://thisis50.com/2021/09/11/celebrating-the-20-years-of-jay-zs-the-blueprint-album

The code says that shorten content should have a minimum 200 length. Could you help to solve this issue?

Thanks

j0k3r commented 2 years ago

I was wondering if you are using Graby to retrieve content or php-readability directly? If you are using Graby, the best solution might be to create a dedicated site config file for that site to tell which xpath should be extracted and it might work.

girishpanchal30 commented 2 years ago

@j0k3r I didn't get you on the solution. Could explain more about here?

We are using this library to get full content from the RSS feed URL. The URL should be anything but it should always refer Feeds only

girishpanchal30 commented 2 years ago

If you are using Graby, the best solution might be to create a dedicated site config file for that site to tell which xpath should be extracted and it might work.

@j0k3r Thank you for your suggestion, Now it working fine with a dedicated site config.