serp-spider / search-engine-google

:spider: Google client for SERPS
https://serp-spider.github.io
Other
168 stars 61 forks source link

Catch extra results in the <div class="srg"> element #6

Closed andreasanta closed 8 years ago

andreasanta commented 8 years ago

This little patch helps to catch results that come with an intermediate div tag between the

and the
results. Sometimes Google spits them out like this, especially where there are adwords or sitelinks involved.

When this "anomalous" intermediate tag is detected, it's children nodes are returned for parsing as simple results.

gsouf commented 8 years ago

srg are already parsed https://github.com/serp-spider/search-engine-google/blob/master/src/Parser/Evaluated/Rule/Natural/SearchResultGroup.php

I think your issue comes from an other source, if you can find the way to reproduce or if you save the failling dom to a file and send it as an attachment we will be able to check the actuel problem.

PS, when you write some html on an issue think to escape it with back quotes, or wont it wont be visible

gsouf commented 8 years ago

The issue was from an other source, search results were wrapped in some additional div that was not parsed, not sure when google does that but it did for your search https://www.google.es/search?q=alarmas+para+casa&lr=lang_es (from gitter chat).

Additionally maps results matched everytime, hopefully that's the last element that was parsed and it only showed up when no other result matched, fixed in 61bd9b74384a8335da3030794289597ca8e0973b

andreasanta commented 8 years ago

Thanks @gsouf. I was going to answer you that it is because the intermediate div, that the "srg" element does not get parsed.

Anyways, I've seen your fix and I'm not sure you should bind the XPath rule to the "_Xhb" class, since it might me subject to future changes. Why don't you just check down the hierarchy for "srg" elements?

gsouf commented 8 years ago

@andreasanta "_Xhb" is for map results, You probably mean "_NId", the reason is that I need more real cases to find the best solution.