serp-spider / search-engine-google

:spider: Google client for SERPS
https://serp-spider.github.io
Other
167 stars 61 forks source link

throw Exception if natural results is not found #59

Open thebennos opened 7 years ago

thebennos commented 7 years ago

The natural results are the essential part of the result page. I think it would be good to make a basic check on each request to look up that the natural results are found .

If google does some changes in the html and the organic results are not found anymore the exceptions is thrown.

gsouf commented 7 years ago

@thebennos I understand the goal but that's a very opinionated feature and it might cause undesirable behavior of the library.

I think the simplest solution for the moment would be to do this check by yourself when getting the results back from the GoogleClient. You can then manage it the way you want it to be managed. For instance you can stop to process the results if you detect that there are less than 9 results.

Anyway that's a feature that might be optional and that can be activated, but there are several things to take in considerations because depending on the elements present on the page google might return less than 10 results. That's worth some thinking for the next releases.

As an addition I can let you know that it's on the internal todo list to create a small application that will continuously parse google serps in order to detect a change as soon as it arrives. I cannot give a date for this feature because it's only an idea for the moment.

thebennos commented 7 years ago

hey

Not exactly what I have i mind. I think it is not needed, todo a full organic results parsing.

I had in mind, to check an xpath or an CSS ID or Class to make sure that the organic results can be parsed and to avoid problem like we had in https://github.com/serp-spider/search-engine-google/issues/56

where a little change makes suspicous problems.

With a little check and if it fails throw an exception, we would make it more clear.

gsouf commented 7 years ago

@thebennos, I'm not sure to understand your proposal. Can you elaborate please?

A few checks are already done to know if the page is parseable. The issue with #56 is that the parser was not able to parse some results because of the google change and the parser considered them as not result items. The point with google change is that anything can change, the structure, a class, an id, really anything...

Being said I think it's possible to add some rules based on the experience with the library.