Closed samuelhaysom closed 1 year ago
@jvmvik Can you take a look?
@samuelhaysom Currently, the best approach would be to use serpapi_pagination
instead as you also mentioned in #25 issue. When #30 is merged, the pagination()
method would be the preferred one. Sorry for such a long reply.
if "next" in results.get("serpapi_pagination", {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query)))
else:
break
For example:
# Google Scholar Search API
from serpapi import GoogleSearch
from urllib.parse import (parse_qsl, urlsplit)
params = {
"api_key": "...", # serpapi api key
"engine": "google_scholar", # search engine
"q": "minecraft redstone", # language
"hl": "en" # search query
}
search = GoogleSearch(params) # where data extraction happens
# to show page number
page_num = 0
# iterate over all pages
results_is_present = True
while results_is_present:
results = search.get_dict() # JSON -> Python dict
if "error" in results:
print(results["error"])
break
page_num += 1
print(f"Current page: {page_num}")
# iterate over organic results and extract the data
for result in results.get("organic_results", []):
print(result.get("position"), result.get("title"), sep="\n")
# check if the next page key is present in the JSON
# if present -> split URL in parts and update to the next page
if "next" in results.get("serpapi_pagination", {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query)))
else:
break
When retrieving results from Google Scholar using the pagination() method, the first article on the second page of google scholar is always missing.
I think this is caused by the following snippet in the update() method of google-search-results-python/serpapi/pagination.py:
This seems to mean that for all pages except the first, paginate increases start by 1. So while for the first page it requests results starting at 0 and ending at 19 (if page_size=20). For the second page it requests results starting at 21 and ending at 40, skipping result 20.
If I delete the if statement, the code seems to work as intended and I get result 19 back.