Closed thebennos closed 7 years ago
Hi @thebennos
If I understood correctly your question what you want to do is to extract the html (the dom) and to to be able to parse it latter. Right?
If this is your question then it's already possible and it's actually very simple!
Html is not parsed until you call getNaturalResults()
and instead of parsing the result you can extract the dom and the url to use them latter. See:
$googleUrl = new GoogleUrl();
$googleUrl->setSearchTerm('simpsons');
$response = $googleClient->query($googleUrl);
// now instead of parsing result we will get the data from the response
$html = $response->getDom();
// $html is a DOMDocument instance (see php documentation for further details)
$url = $response->getUrl();
// $url is an url object, you can transtype it to string
Now you can store this url and this html at the place of your convenience and latter you can parse it again:
$url = ....; // url stored previously
$html = ....; // html stored previously
$serp = new GoogleSerp($html, $url);
$serp->getNaturalResults();
Does that answer your question?
oh, I did not realized the getDOM function yet.
"If I understood correctly your question what you want to do is to extract the html (the dom) and to to be able to parse it latter. Right?" Yes.
Thats cool, so I can integrate RabbitMQ as message transport system and split it in different worker jobs. Great, thx.
I'm closing the issue because all looks good now.
Currently it is one process (Get data and parse it. )
Google can change the html every time and the complete process fails or outputs wrong results. an option to split the process in two parts would be nice, like this:
two independent process. If google does changes in html. No problem, we have time to adjust the parsing and can parse it later.