Open OmarMonterrey opened 3 years ago
Happened to me as well. The SERPS implementation for Google is not able to parse HTML correctly. Please, fix it ASAP.
Yes, it looks like Google DOM has changed.
Since the below function in the package looks for the "class", and it returns null, all the functions that use javascriptIsEvaluated() breaks. For example: getNaturalResults and getAdwordsResults
public function javascriptIsEvaluated()
{
$body = $this->getXpath()->query('//body');
if ($body->length != 1) {
throw new Exception('No body found');
}
$body = $body->item(0);
/** @var $body \DOMElement */
$class = $body->getAttribute('class');
if ($class=='hsrp') {
return false;
} elseif (strstr($class, 'srp')) {
return true;
} else {
throw new InvalidDOMException('Unable to check javascript status.');
}
}
Do you have a plan about solving this issue?
Thank you
Yes, it looks like Google DOM has changed.
Since the below function in the package looks for the "class", and it returns null, all the functions that use javascriptIsEvaluated() breaks. For example: getNaturalResults and getAdwordsResults
public function javascriptIsEvaluated() { $body = $this->getXpath()->query('//body'); if ($body->length != 1) { throw new Exception('No body found'); } $body = $body->item(0); /** @var $body \DOMElement */ $class = $body->getAttribute('class'); if ($class=='hsrp') { return false; } elseif (strstr($class, 'srp')) { return true; } else { throw new InvalidDOMException('Unable to check javascript status.'); } }
Do you have a plan about solving this issue?
Thank you
You were right, the issue were right there but the body tag has the proper attributes, since I'm only using "getNaturalResults", I implemented a little hack;
$html = preg_replace('/^.*?(<body)/is','$1', $html);
Basically I removed all before <body tag, that way the DOM is parsed as expected and the classes are checked, so it's working for me now.
Thank you, it works as a temporary fix. I hope the package will get an update about this for a permanent fix.
So I have talked with the developer of this library. He told me that he does not have the time to maintain the library, so there won't be any updates from now sadly. 🙃
So I have talked with the developer of this library. He told me that he does not have the time to maintain the library, so there won't be any updates from now sadly.
This explains a lot of pull request being "ignored"...
The DOM to get the number of results has changed too. I applied @OmarMonterrey 's hack:
// in vendor/serps/core/src/Core/Http/SearchEngineResponse.php
public function getPageContent()
{
$this->pageContent = preg_replace('/^.*?(<body)/is','$1', $this->pageContent);
return $this->pageContent;
}
And changed this to get the number of results:
// in vendor/serps/search-engine-google/src/Page/GoogleSerp.php
public function getNumberOfResults()
{
$item = $this->cssQuery('#result-stats');
// ... etc
}
The DOM to get the number of results has changed too. I applied @OmarMonterrey 's hack:
// in vendor/serps/core/src/Core/Http/SearchEngineResponse.php public function getPageContent() { $this->pageContent = preg_replace('/^.*?(<body)/is','$1', $this->pageContent); return $this->pageContent; }
And changed this to get the number of results:
// in vendor/serps/search-engine-google/src/Page/GoogleSerp.php public function getNumberOfResults() { $item = $this->cssQuery('#result-stats'); // ... etc }
I've been running the following for about a year now and it's kept this change at bay:
/**
// in vendor/serps/search-engine-google/src/Page/GoogleSerp.php
* Get the total number of results available for the search terms
* @return int the number of results
* @throws InvalidDOMException
*/
public function getNumberOfResults()
{
$item = $this->cssQuery('#resultStats');
if ($item->length < 1) {
$item = $this->cssQuery('#result-stats');
if ($item->length < 1) {
return null;
}
}
URL: https://www.google.com/search?q=download+youtube+thumbnail Expected: Correct parsing What I'm getting: Unable to check javascript status. Google DOM has possibly changed and an update may be required. The HTML is OK and I have composer completly up to date; I'm attaching HTML screenshot and content
invalid_dom.zip