Closed ilyazub closed 3 years ago
Good idea. We could take advantage of the generator in Python. The callback enables to extract the data and present them using a return. A new paginate method returns a generator which yields the callback until the all the pages are returned. Each page will cost one search.
def callback(results):
return results["news_results"]
for news_result in search.paginate(callback):
print(news_result)
The data presentation is isolated from the data processing.
Actually we do not need a callback because SerpApi backend does already offer proper pagination.
# initialize the search
search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas"})
# to get 2 pages
start = 0
end = 20
# create a python generator
pages = search.pagination(start, end)
print("display generated")
urls = []
# fetch one result per iteration of the for loop
for result in pages:
urls.append(result['serpapi_pagination']['next'])
self.assertEqual(len(urls), 2)
self.assertTrue("start=10" in urls[0])
self.assertTrue("start=20" in urls[1])
see commit: f9a470a9efd8c8956cb84597b4c33136670e3abb
for result in pages: urls.append(result['serpapi_pagination']['next'])
Great! Thank you for the idea with the iterator. It looks cleaner than my initial idea.
Confirming that it works. Example.
https://user-images.githubusercontent.com/282605/116860340-195b5b00-ac0a-11eb-818a-2de63c0c813f.mp4
f9a470a works, but sometimes it breaks with the exception on the last page:
Traceback (most recent call last):
File "main.py", line 16, in <module>
for result in pages:
File "/opt/virtualenvs/python3/src/google-search-results/serpapi/pagination.py", line 19, in __next__
if not 'next' in result['serpapi_pagination']:
KeyError: 'serpapi_pagination'
I was looking for this info in the documentation but it is not written anywhere. Could you please add it in the python package README and also on the SERPApi website for paying customers? Thank you!
thanks for your feedbacks!
@ilyazub code improved to handle missing "serpapi_pagination" field on the last page. @kikohs README updated with information on pagination support.
Currently, the way to paginate searches is to get the
serpapi_pagination.current
and increase theoffset
orstart
parameters in the loop. Like with regular HTTP requests toserpapi.com/search
without an API wrapper.A more convenient way for an official API wrapper would be to provide some function like
search.paginate(callback: Callable)
which will properly calculate offset for the specific search engine and loop through pages until the end.@jvmvik @hartator What do you think?