Closed hstanleycrow closed 1 year ago
Hi @hstanleycrow,
You may need to take a look at https://github.com/j0k3r/graby which uses php-readability under the hood but with what we call "siteconfig" files^2 to change the way the tool will extract the content
Thank you very much for your help.
Hi @hstanleycrow,
You may need to take a look at https://github.com/j0k3r/graby which uses php-readability under the hood but with what we call "siteconfig" files1 to change the way the tool will extract the content
Footnotes
1. https://github.com/fivefilters/ftr-site-config [↩](#user-content-fnref-2-17444f9ff24a815a31a10c66ce2890eb)
Hi, I have many days using this code to extract the content from webs (a lot of them) but today I found one with one problem. The URL is https://www.searchmetrics.com/glossary/ranking-factor/ Readability extracts the text from the middle and ignores all the text over, It extracts on this text "In the graph, four example correlations and the respective curves are shown." This specific line gets the largest score, so, I am not sure what can I do ¿Any idea? Thank you very much.