Closed ehsong closed 9 months ago
I figured this out -- I had to reiterate collect 50 posts, offset 50 and collect more until I scraped the rest.
Hello @ehsong.
Here's the code example to get all Baidu Search Results pages (ref: https://replit.com/@serpapi/baidu-all-pages-serpapi#main.py)
# Python package: https://pypi.org/project/serpapi
from serpapi import Client as SerpApiClient
import os
params = {
"engine": "baidu",
"q": "市民社会",
"ct": "2",
"rn": "50",
"gpc": "stf=1356994860,1696155445|stftype=1",
"q5": "intitle: '市民社会'",
"q6": "site:zhihu.com",
"pn": "20",
}
serpapi = SerpApiClient(api_key=os.environ['SERPAPI_API_KEY'])
search = serpapi.search(params)
print(f"Current page: {search.get('serpapi_pagination', {}).get('current')}\n")
for organic_result in search.get("organic_results", []):
print(f"Title: {organic_result['title']}\nLink: {organic_result['link']}\n")
for result in search.yield_pages():
print(
f"Current page: {result.get('serpapi_pagination', {}).get('current')}\n")
for organic_result in result.get("organic_results", []):
print(
f"Title: {organic_result['title']}\nLink: {organic_result['link']}\n")
If you want to request the data without the SerpApi client library, you may use the serpapi_pagination.next
to get the next page URL.
Example
{
// Omitted...
"serpapi_pagination": {
"next": "https://serpapi.com/search.json?ct=1&device=desktop&engine=baidu&f=8&gpc=stf%3D1356994860%2C1696155445%7Cstftype%3D1&oq=%E5%B8%82%E6%B0%91%E7%A4%BE%E4%BC%9A&pn=100&q=%E5%B8%82%E6%B0%91%E7%A4%BE%E4%BC%9A&rn=50",
// Omitted...
}
}
If you have any quesions, feel free to ask our support via email (contact@serpapi.com), the Contact Us form (https://serpapi.com/#contact) or the chat widget in the bottom right corner of the https://serpapi.com website.
Hello, I am testing SerpAPI with BaiduSearch function. How do I collect all results using the parameter 'rn' and 'pn'? If 'rn' is limited to 50 results, then how do I collect all results? Is there a way to feed in 'other pages' separately to collect all results?
Under organic results
search.get_dict()['organic_results']
there were only 10 results listed, so I don't think the parameter 'rn' is working properly. I am using Python 3.9, OS. I inserted the parameters because there were 200 results over 20 pages on baidu, and I wanted to get all the links.