spyoungtech / grequests

Requests + Gevent = <3
https://pypi.python.org/pypi/grequests
BSD 2-Clause "Simplified" License
4.46k stars 331 forks source link

grequests slows down with asyncio #142

Closed reach4bawer closed 1 year ago

reach4bawer commented 4 years ago

I am using grequests to get the data from 15000 hosts that are active. I first ping the hosts to check if they are active or not. That code take around 350 seconds to check the hosts. Once I get them I filter the hosts lists to just the active ones and want to use grequests to get the text from the hosts.

When I ran grequests individually on all the hosts without integrating it with the other code it take around 600 seconds. When I combine the two, I can't get either to work at peak speed for some reason that I cannot figure out.

I am running the following code in the environment with -

MacOS Mojave
python - 3.7.4
asyncio==3.4.3
greenlet==0.4.15
grequests==0.6.0
gevent==20.5.0

My code is as follows -

from gevent import monkey

def stub(*args, **kwargs): 

monkey.patch_all = stub
import grequests
import asyncio
import time
from collections import deque
import pandas as pd

host_df = pd.read_csv('IP.csv', delimiter='|')

async def async_ping(host, semaphore):
    async with semaphore:
        for _ in range(5):
            proc = await asyncio.create_subprocess_shell(
                f'/sbin/ping {host} -c 1 -W 2 -t 5',
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE
            )
            status = await proc.wait()
            if status == 0:
                print("Done")
                return 'Alive'
        print("Done")
        return 'Timeout'

async def async_main(hosts, limit):
    semaphore = asyncio.Semaphore(limit)
    tasks1 = deque()
    for host in hosts:
        tasks1.append(asyncio.create_task(
            async_ping(host, semaphore))
        )
    return (t1 for t1 in await asyncio.gather(*tasks1))
# set concurrent task limit
limit = 512

start = time.perf_counter()

loop = asyncio.get_event_loop()
asyncio.set_event_loop(loop)
resp = loop.run_until_complete(async_main(host_df['Domain'].to_list(), limit))
loop.close()

finish = time.perf_counter()

host_df['Status'] = list(resp)
print(host_df)
print(f'Runtime: {round(finish - start, 4)} seconds')
#########################################
urls = host_df['Domain'][host_df['Status']=='Active'].to_list()
reqs = [grequests.get('https://' + url, verify=False) for url in urls]
out = grequests.map(reqs, exception_handler=exception_handler)

If I put the code for grequests before I check the status then the async code slows down to crawling speed or gets stuck. Can you point me in the right direction?

belingud commented 4 years ago

Gevent with asyncio loop running together would cause some problems. I suggest you chose one between them.

reach4bawer commented 4 years ago

I understand. Do you know the reason why it does not play well?

spyoungtech commented 4 years ago

Well, for one thing, asyncio requires cooperative use of the async keyword. If you call a function that does not use async/await it will block. gevent itself does not support asyncio, therefore is not totally compatible with applications using asyncio, at least perhaps not in the way you'd want them to be.

Anyhow, grequests and gevent both predate the implementation of asyncio in Python. So when using grequests, because none of the internals ever use async/await, any use of the functions in grequests will block everything because it will not give up control of the event loop for other functions to run until the function is done running.

There's likely certain ways to leverage asyncio with this package and make it somewhat useful, but they're probably not very intuitive patterns. The monkey patching that gevent does is also a large source of incompatibility with other libraries, but I don't know what parts, if any, would make it strictly incompatible with asyncio.

However, from a quick search, there do seem to be variations of the gevent lib out there which are async compatible. For example aiogevent.

I hope that makes sense and helps answer your question @reach4bawer

reach4bawer commented 4 years ago

Thank you so much that makes sense. It does help me understand the reason. I think a work around for this would be to load one library and then remove it from the program memory after it's use. Just so that they might not conflict or become blockers for each other.