serpapi / google-search-results-python

Google Search Results via SERP API pip Python Package
MIT License
571 stars 93 forks source link

[Pagination] Pagination isn't correct and it skips index by one #38

Closed kagermanov27 closed 1 year ago

kagermanov27 commented 1 year ago

image

Since the start value starts from 0, the correct second page should be 10 and not 11.

This behaviour is causing a skip in pages also. The customers are getting confusing results:

image

Intercom Link First recognized by @marm123.

I think this part needs to be replaced by: image

self.client.params_dict['start'] += 0

Whether it would cause any error on other engines is something I don't know. But it may also fix it for every other engine.

dimitryzub commented 1 year ago

A workaround of using pagination() with parse_qsl, urlsplit from urllib.parse:

from serpapi import GoogleSearch
from urllib.parse import (parse_qsl, urlsplit)

params = {
    'api_key': '...',     # serpapi api key
    'engine': 'google',   # search engine
    'q': 'minecraft',     # search query
    'hl': 'en'            # language
    # 'start': 0          # or explicitly add page number
}

search = GoogleSearch(params)

# iterate over all pages
while True:
    results = search.get_dict()

    print(f"Currenlty on page: {params.get('start')}")

    if 'error' in results:
        print(results['error'])
        break

    # for result in results.get('organic_results', []):
    #     print(result.get('position'), result.get('title'), sep='\n')

    # check if the next page key is present in the JSON
    # if present -> split URL in parts and update to the next page
    if 'next' in results.get('serpapi_pagination', {}):
        search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination').get('next')).query)))
    else:
        break

Output:

Currenlty on page: None # first page. if 'start' param is present and set explicitly to 0, None will become 0
Currenlty on page: 10   # second page
Currenlty on page: 20
Currenlty on page: 30
Currenlty on page: 40
Currenlty on page: 50
Currenlty on page: 60
Currenlty on page: 70

Example from the dashboard -> your searches page (Inspect last request (8th page), next hash key is empty thus exists pagination) :

image

image