Closed khalednatour closed 5 years ago
Did you have an example on how you implemented it?
Sure.
$url = "https://www.bbc.com/news/uk-politics-47352446";
$article_content = GetArticleContent($url);
mysql_query("INSERT INTO articles
(url, content) VALUES
('$url', '$article_content' )");
function GetArticleContent($url=null){
require_once '../lib/readability/Readability.php';
$html = file_get_contents($url);
// Note: PHP Readability expects UTF-8 encoded content.
// If your content is not UTF-8 encoded, convert it
// first before passing it to PHP Readability.
// Both iconv() and mb_convert_encoding() can do this.
// If we've got Tidy, let's clean up input.
// This step is highly recommended - PHP's default HTML parser
// often does a terrible job and results in strange output.
if (function_exists('tidy_parse_string')) {
$tidy = tidy_parse_string($html, array(), 'UTF8');
$tidy->cleanRepair();
$html = $tidy->value;
}
// give it to Readability
$readability = new Readability($html, $url);
// print debug output?
// useful to compare against Arc90's original JS version -
// simply click the bookmarklet with FireBug's console window open
$readability->debug = false;
// convert links to footnotes?
$readability->convertLinksToFootnotes = true;
// process it
$result = $readability->init();
// does it look like we found what we wanted?
if ($result) {
$content = $readability->getContent()->innerHTML;
// if we've got Tidy, let's clean it up for output
if (function_exists('tidy_parse_string')) {
$tidy = tidy_parse_string($content, array('indent'=>true, 'show-body-only' => true), 'UTF8');
$tidy->cleanRepair();
$content = $tidy->value;
}
return trim(strip_tags($content));
} else {
return "";
}
}
Instead of $content = $readability->getContent()->innerHTML;
try:
$content = $readability->getContent()->ownerDocument->saveXML($readability->getContent());
Thank you! both are working fine when requesting it through webpage normal, but my issue happened when running it through a cronjob .
I've no idea then. What's your error? How do you define your cronjob?
I didn't find any error :D
Question, Does your code depend on something outside the local server?
Nope.
This issue is fairly old and there hasn't been much activity on it. Closing, but please re-open if it still occurs.
do you have idea why the get content not working in cronjob?