Hi (again)
This is just a suggestion for improvement
I am making any scrappers to get data of several webs, and. I'm concerned about
the possibility of
any changes in the structure of the webs that I'm scrapping.
My scrappers do sistematic (and unnatended) work so... I always need to check
if all the tags what are I
spected are in the web page and log it for posterior analysis.
With this in my main... I never can concatenate several operations (select,
getPlainText, etc) because if any of
the selects returns null, the script crash with the error:
Fatal error: Call to a member function getPlainText() on a non-object in ...
Sometimes I call to select just for test if a node is present (for example,
test if the div with id
"LastMinuteOffer" it's present.
In this case, I dont concatenate calls, just do:
$t1=$html->select('div#LastMinuteOffer',0);
if ($t1){
//There are a last minute offfer...
}
But sometimes, I just want to get the text of a delimited node, so, in any
cases, I concatenate several
calls in one, something like this:
$MovieTitle=$html->select('h3.title a.title',0)->getPlainText();
In this case, if the select fails, returns null, so... the getPlainText() fires
the error:
Fatal error: Call to a member function getPlainText() on a non-object in ...
and the script fails.
This circunstance forces me to no concatenate nothing and test every thing,
with nasty code like this:
$t1=$html->select('h3.title a.title',0)->getPlainText();
if (!$t1) {$TheError='Fail in Movie Title'; return false }
$MovieTitle=$t1->getPlainText();
I have done a new function to improve my code, perhaps any other guy is
interested in:
select_imperative
With this function, I can concatenate all I want without danger of errors and I
can catch the exception if any of the
selects fails.
I can do something like:
try {
$MovieTitle=$html->select_imperative('h3.title a.title',0)->getPlainText();
} catch(Exception $e) {
$TheError='Fail in Movie Title: '.$e->getMessage()."\n";
return false; //Return with error
}
return true; //Return All ok
Or can catch group all the errors in just one:
try {
$MovieTitle=$html->select_imperative('h3.title a.title',0)->getPlainText();
$Author=$html->select_imperative('span.author',0)->getPlainText();
$Date=$html->select_imperative('span.date',0)->getPlainText();
$Format=$html->select_imperative('span.format',0)->getPlainText();
} catch(Exception $e) {
$TheError='Error scrapping Movie: '.$e->getMessage();
return false; //Return with error
}
return true; //Return All ok
With this I reduce my code huff.... a lot.
In the class HTML_Node:
function select_imperative($query = '*', $index = false, $recursive = true, $check_self = false) {
if ( ($rv=$this->select($query,$index,$recursive, $check_self)) == null){
throw new Exception('Null query in select: '.$query);
} else return $rv;
}
and, in the class HTML_Parser:
function select_imperative($query = '*', $index = false, $recursive = true, $check_self = false) {
return $this->root->select_imperative($query, $index, $recursive, $check_self);
}
Regards!
Original issue reported on code.google.com by Radika...@gmail.com on 21 Sep 2012 at 6:26
Original issue reported on code.google.com by
Radika...@gmail.com
on 21 Sep 2012 at 6:26