serp-spider / search-engine-google

:spider: Google client for SERPS
https://serp-spider.github.io
Other
168 stars 61 forks source link

Issue with mobile UA string ? #89

Closed LunarDevelopment closed 6 years ago

LunarDevelopment commented 6 years ago

Hello,

When I use a mobile ua I get the below exception, is this an issue with parsing mobile pages in general or something I'm doing wrong..? example ua which produces the error:

$userAgent = "Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19";

/serps/search-engine-google/src/Parser/Evaluated/Rule/Natural/Classical/ClassicalCardsResultZ1m.php

Symfony \ Component \ Debug \ Exception \ FatalThrowableError (E_ERROR)
Call to undefined method DOMCdataSection::hasClass()
Symfony\Component\Debug\Exception\FatalThrowableError: Call to undefined method DOMCdataSection::hasClass() in /Users/directory/Projects/directory/directory/directory/vendor/serps/search-engine-google/src/Parser/Evaluated/Rule/Natural/Classical/ClassicalCardsResultZ1m.php:32
Stack trace:
#0 /Users/directory/Projects/directory/directory/directory/vendor/serps/search-engine-google/src/Parser/AbstractParser.php(76): Serps\SearchEngine\Google\Parser\Evaluated\Rule\Natural\Classical\ClassicalCardsResultZ1m->match(Object(Serps\SearchEngine\Google\Page\GoogleSerp), Object(Serps\Core\Dom\DomElement))
#1 /Users/directory/Projects/directory/directory/directory/vendor/serps/search-engine-google/src/Parser/AbstractParser.php(78): Serps\SearchEngine\Google\Parser\AbstractParser->parseGroups(Object(Serps\Core\Dom\DomNodeList), Object(Serps\Core\Serp\IndexedResultSet), Object(Serps\SearchEngine\Google\Page\GoogleSerp))
#2 /Users/directory/Projects/directory/directory/directory/vendor/serps/search-engine-google/src/Parser/AbstractParser.php(52): Serps\SearchEngine\Google\Parser\AbstractParser->parseGroups(Object(Serps\Core\Dom\DomNodeList), Object(Serps\Core\Serp\IndexedResultSet), Object(Serps\SearchEngine\Google\Page\GoogleSerp))
#3 /Users/directory/Projects/directory/directory/directory/vendor/serps/search-engine-google/src/Page/GoogleSerp.php(49): Serps\SearchEngine\Google\Parser\AbstractParser->parse(Object(Serps\SearchEngine\Google\Page\GoogleSerp))
#4 /Users/directory/Projects/directory/directory/directory/app/Crawlers/GoogleKeywordCrawler.php(72): Serps\SearchEngine\Google\Page\GoogleSerp->getNaturalResults()
#5 /Users/directory/Projects/directory/directory/directory/app/Jobs/Seo/ProcessKeywordRanking.php(103): App\Crawlers\GoogleKeywordCrawler->processCrawlResults()
#6 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Redis/Limiters/DurationLimiter.php(89): App\Jobs\Seo\ProcessKeywordRanking->App\Jobs\Seo\{closure}()
#7 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Redis/Limiters/DurationLimiterBuilder.php(113): Illuminate\Redis\Limiters\DurationLimiter->block(3, Object(Closure))
#8 /Users/directory/Projects/directory/directory/directory/app/Jobs/Seo/ProcessKeywordRanking.php(145): Illuminate\Redis\Limiters\DurationLimiterBuilder->then(Object(Closure), Object(Closure))
#9 [internal function]: App\Jobs\Seo\ProcessKeywordRanking->handle()
#10 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(29): call_user_func_array(Array, Array)
#11 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(87): Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()
#12 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(31): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application), Array, Object(Closure))
#13 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/Container.php(549): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Array, NULL)
#14 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php(94): Illuminate\Container\Container->call(Array)
#15 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(114): Illuminate\Bus\Dispatcher->Illuminate\Bus\{closure}(Object(App\Jobs\Seo\ProcessKeywordRanking))
#16 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(102): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(App\Jobs\Seo\ProcessKeywordRanking))
#17 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php(98): Illuminate\Pipeline\Pipeline->then(Object(Closure))
#18 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php(49): Illuminate\Bus\Dispatcher->dispatchNow(Object(App\Jobs\Seo\ProcessKeywordRanking), false)
#19 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php(76): Illuminate\Queue\CallQueuedHandler->call(Object(Illuminate\Queue\Jobs\RedisJob), Array)
#20 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(320): Illuminate\Queue\Jobs\Job->fire()
#21 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(270): Illuminate\Queue\Worker->process('redis', Object(Illuminate\Queue\Jobs\RedisJob), Object(Illuminate\Queue\WorkerOptions))
#22 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(114): Illuminate\Queue\Worker->runJob(Object(Illuminate\Queue\Jobs\RedisJob), 'redis', Object(Illuminate\Queue\WorkerOptions))
#23 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon('redis', 'high,medium,low', Object(Illuminate\Queue\WorkerOptions))
#24 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(85): Illuminate\Queue\Console\WorkCommand->runWorker('redis', 'high,medium,low')
#25 [internal function]: Illuminate\Queue\Console\WorkCommand->handle()
#26 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(29): call_user_func_array(Array, Array)
#27 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(87): Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()
#28 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(31): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application), Array, Object(Closure))
#29 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Container/Container.php(549): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Array, NULL)
#30 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Console/Command.php(183): Illuminate\Container\Container->call(Array)
#31 /Users/directory/Projects/directory/directory/directory/vendor/symfony/console/Command/Command.php(252): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
#32 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Console/Command.php(170): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
#33 /Users/directory/Projects/directory/directory/directory/vendor/symfony/console/Application.php(946): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#34 /Users/directory/Projects/directory/directory/directory/vendor/symfony/console/Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#35 /Users/directory/Projects/directory/directory/directory/vendor/symfony/console/Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#36 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Console/Application.php(88): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#37 /Users/directory/Projects/directory/directory/directory/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(121): Illuminate\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#38 /Users/directory/Projects/directory/directory/directory/artisan(37): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#39 {main}
LunarDevelopment commented 6 years ago

Also just stumbled on this one when using the following UA:

            $userAgent = "Mozilla/5.0 (Android 7.0; Mobile; rv:54.0) Gecko/54.0 Firefox/54.0";
Serps\SearchEngine\Google\Exception\InvalidDOMException: Raw dom is not supported, please provide an evaluated version of the dom Google DOM has possibly changed and an update may be required. in /Users/directory/Projects/directory/directory/directory/vendor/serps/search-engine-google/src/Page/GoogleSerp.php:47
gsouf commented 6 years ago

Hi @LunarDevelopment

I was already reported the first issue and didn't find how to reproduce it. Didn't think about mobile results, that I'm currently getting issues with as well. Will be working to get it fixed.

For the second issue the reason is that you are using an outdated user agent and google is responding with its "legacy" version that serps does not support anymore.

LunarDevelopment commented 6 years ago

Thanks for coming back to me so quickly, and p.e. I love your libraries, bravo.

I've dug about and if I comment out the following then the script will run, but I get no NaturalResultType::CLASSICAL to iterate through, so I'm guessing all I've done is remove the classes which parse results.


/**
 * Parses natural results from a mobile google SERP
 */
class MobileNaturalParser extends AbstractParser
{

    /**
     * @inheritdoc
     */
    protected function generateRules()
    {
        return [
            new Divider(),
            new SearchResultGroup(),
            // new ClassicalCardsResultZ1m(),
            new ClassicalCardsResult(),
            // new TweetsCarouselZ1m(),
            new ImageGroupCarousel(),
            new ComposedTopStories(),
            new VideoGroup(),
            new ImageGroup(),
            new PeopleAlsoAsk(), // people also ask must be placed before knowledge card to stop parsing
            new KnowledgeCard()
        ];
    }

    /**
     * @inheritdoc
     */
    protected function getParsableItems(GoogleDom $googleDom)
    {
        $xpathObject = $googleDom->getXpath();
        $xpathElementGroups = "//div[@id = 'ires']/*[@id = 'rso']/*";
        return $xpathObject->query($xpathElementGroups);
    }
}
gsouf commented 6 years ago

Thanks for the details. If you have a little time to help me fixing this issue, then there is something you can do for me.

That would be to send me a copy of the dom that fails to parse (add is as an attachment in a message of this thread).

Details to save the dom are available here: https://serp-spider.github.io/documentation/search-engine/google/parse-page/#manipulate-the-dom-object

LunarDevelopment commented 6 years ago

Here you go :

broken_mobile_dom.html.zip

LunarDevelopment commented 6 years ago

Hello again, is there a rough fix timescale or workaround on this issue?

Do you have a working mobile UA string ?

gsouf commented 6 years ago

Hi @LunarDevelopment

Here is a modern mobile UA string:

Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1

Unfortunately I cannot you give you further details, I will be short in time this week, but will fix it as soon as possible.

LunarDevelopment commented 6 years ago

Thanks for the suggestion, I've tested the UA string:

Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1

And am still getting :

Symfony \ Component \ Debug \ Exception \ FatalThrowableError (E_ERROR) Call to undefined method DOMCdataSection::hasClass()

Do you get the same ?

gsouf commented 6 years ago

Hi @LunarDevelopment

Not sure yet, I'll address this in the next few days now

LunarDevelopment commented 6 years ago

I don't believe this library handles Google mobile results in the current incantation, I'd suggest removing the function and narrowing the scope of the library unless there's more contributors come on board to implement.

Closing this issue for now,

gsouf commented 6 years ago

@LunarDevelopment mobile parsing is used in production environment with success