serpapi / google-search-results-python

Google Search Results via SERP API pip Python Package
MIT License
600 stars 97 forks source link

[Feature Request] Add Async Implementation #48

Closed aliayar closed 1 year ago

aliayar commented 1 year ago

One of our users asked for async requests implementation as in this example here.

https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-aiohttp

It is possible that this function can accept a list of URLs to request as argument.

hartator commented 1 year ago

One of our users asked for async requests implementation as in this example here.

Can you share the Intercom link? Thank you.

https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-aiohttp

It is possible that this function can accept a list of URLs to request as argument.

They should be able to use this directly this library if they want. They can do a regular HTTP get request against https://serpapi.com/search?q=..., but via this library. I am not sure if we are going to officially support this or not.

aliayar commented 1 year ago

Here is the Intercom thread.

dimitryzub commented 1 year ago

Just sharing an example of making a 50 direct request to serpapi.com/search.json (youtube engine) which will produce a faster response time in comparison to the google-search-results, and slightly faster response times when using async batch requests provided by the google-search-results.

Ref:

Code example:

import aiohttp
import asyncio
import os
import json
import time

async def fetch_results(session, query):
    params = {
        'api_key': '...', # https://serpapi.com/manage-api-key
        'engine': 'youtube',
        'device': 'desktop',
        'search_query': query,
        'no_cache': 'true'
        # additional params
    }

    url = 'https://serpapi.com/search.json'
    async with session.get(url, params=params) as response:
        results = await response.json()

    data = []

    if 'error' in results:
        print(results['error'])
    else:
        for result in results.get('video_results', []):
            data.append({
                'title': result.get('title'),
                'link': result.get('link'),
                'channel': result.get('channel').get('name'),
            })

    return data

async def main():
    # 50 queries
    # here could be a dict or txt/csv/excel/json file
    queries = [
        'burly',
        'creator',
        'doubtful',
        'chance',
        'capable',
        'window',
        'dynamic',
        'train',
        'worry',
        'useless',
        'steady',
        'thoughtful',
        'matter',
        'rotten',
        'overflow',
        'object',
        'far-flung',
        'gabby',
        'tiresome',
        'scatter',
        'exclusive',
        'wealth',
        'yummy',
        'play',
        'saw',
        'spiteful',
        'perform',
        'busy',
        'hypnotic',
        'sniff',
        'early',
        'mindless',
        'airplane',
        'distribution',
        'ahead',
        'good',
        'squeeze',
        'ship',
        'excuse',
        'chubby',
        'smiling',
        'wide',
        'structure',
        'wrap',
        'point',
        'file',
        'sack',
        'slope',
        'therapeutic',
        'disturbed'
    ]

    data = []

    async with aiohttp.ClientSession() as session:
        tasks = []
        for query in queries:
            task = asyncio.ensure_future(fetch_results(session, query))
            tasks.append(task)

        start_time = time.time()
        results = await asyncio.gather(*tasks)
        end_time = time.time()

        data = [item for sublist in results for item in sublist]

    print(json.dumps(data, indent=2, ensure_ascii=False))
    print(f'Script execution time: {end_time - start_time} seconds') # ~7.192448616027832 seconds

asyncio.run(main())
aliayar commented 1 year ago

Hey Dmitriy,

I have shared a similar code with the users that I wrote before:

import aiohttp
import asyncio
import time

start_time = time.time()

def getkeywords():

    keywords = []
    with open('keywords.txt') as f:
        for line in f:
            keywords.append(line.rstrip('\n'))
    return keywords

async def get_serps(session, r):
    async with session.get(r) as resp:
        serp = await resp.json()
        return serp

async def main():

    result = {}

    async with aiohttp.ClientSession() as session:

        tasks = []
        for keyword in getkeywords():
            r = 'https://serpapi.com/search.json?q=' + keyword + '&api_key=YOUR_API_KEY&hl=en&gl=us&no_cache=True'
            tasks.append(asyncio.ensure_future(get_serps(session, r)))

        original_serps = await asyncio.gather(*tasks)
        for serpresult in original_serps:
            result[serpresult['search_parameters']['q']] = serpresult

    for item in result:
        count=0
        for serp in result[item]['organic_results']:
            if count == 0:
                if serp['link'].startswith('https://serpapi.com/'):
                    print(item + ' ' + str(serp['position']) + ' ' + serp['link'])
                    count+=1

asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))
dimitryzub commented 1 year ago

@aliayar I saw 🙂

I've done a slightly different approach to possibly give some additional clarification for future users on how it could be done. I also wrote a blog post about it: Make Direct Async Requests to SerpApi with Python. I think the more examples we provide the better.

The following part of this blog post will be about adding pagination with async functionality.