ross / requests-futures

Asynchronous Python HTTP Requests for Humans using Futures
Other
2.11k stars 152 forks source link

Thread-safety of FutureSession #47

Closed thelink2012 closed 2 years ago

thelink2012 commented 7 years ago

According to some issues on requests, such as requests/requests#2766 and requests/requests#1871, the Session class is not thread-safe.

FuturesSession inherits directly from Session, and just pushes jobs to a thread pool using the same session object (which is self), is that really thread-safe?

ross commented 7 years ago

FutureSession would be no more or less thread-safe than Session. I'd been working off the old "thread-safe" mention on the site and assumed that it was. From the looks of it the main concern is around access to the cookiejar or to a large numbers of hosts.

In general the primary way I've used requests-futures has been to hit APIs that don't use cookies, generally just one or a handful of hosts, so I've never run into problems. Don't recall anyone reporting specific problems that sound like these sorts of issues either. Did you run into something or just looking around at it?

Overall a per-thread Session wouldn't make much sense for what FuturesSession is designed to do so I don't think there's really a fix to be made.

thelink2012 commented 7 years ago

Yeah, was looking around for asynchronous alternatives to requests that wouldn't require a lot of code changes and came up here.

I ended up picking aiohttp. But this was still an forkable alternative, good work :)

ross commented 7 years ago

Yeah, was looking around for asynchronous alternatives to requests that wouldn't require a lot of code changes and came up here :)

Cool. For what it's worth I'd be surprised if you had problems with it, but obviously if you have use-cases likely run into the underlying issues...

alastairpatrick commented 6 years ago

A race condition that might or might not have been related to requests lead me to replace FuturesSession with this:

import requests
from concurrent.futures import ThreadPoolExecutor
from threading import local

class FuturesSession(requests.Session):
  def __init__(self, max_workers=2, session_factory=requests.Session, *args, **kwargs):
    super().__init__(*args, **kwargs)
    self.session_factory = session_factory
    self.session_args = args
    self.session_kwargs = kwargs
    self.sessions = []
    self.executor = ThreadPoolExecutor(max_workers=max_workers)
    self.tls = local()

  def request(self, *args, **kwargs):
    def func(*args, **kwargs):
      session = getattr(self.tls, "session", None)
      if session is None:
        session = self.session_factory(*self.session_args, **self.session_kwargs)
        self.tls.session = session
        self.sessions.append(session)
      return session.request(*args, **kwargs)

    return self.executor.submit(func, *args, **kwargs)

  def close(self):
    self.executor.shutdown(wait=True)
    for session in self.sessions:
      session.close()
    super().close()

Rather than share one Session among multiple worker threads, this FuturesSession creates a separate Session object for each worker thread. Among others, a disadvantage of this approach is that the Sessions do not share a common cookie jar. However, I believe it avoids the "thread-unsafe" aspects of requests and so far it's working for me.

mohiz commented 3 years ago

I have asked a question about my problem in using this module and the threading problem I think exists in requests module. In brief I have a scenario like this:

from requests_futures.sessions import FuturesSession
import asyncio

async def main():
    session = FuturesSession(max_workers=100)

    params = ['foo{}'.format(i) for i in range(1000)]

    url = 'some_host'

    async def send_request(param):
        f = session.get(url, params={'param': param})

        t = asyncio.ensure_future(asyncio.wrap_future(f))

        r = await t

        # check t is the request you expected, for example check that if you 
        # request for foo1 you are receiving foo1 object, but sometimes
        # returns value for different parameters                                                                                                                                

        return r

    tasks = []

    for param in params:
        tasks.append(
            asyncio.ensure_future(send_request(param))
        )

    await asyncio.wait(tasks,
                       return_when=asyncio.ALL_COMPLETED)

Many request for getting same url, this has a strange result which for some query returned value is related to other get request, I think the problem is basic inside requests module when getting a connection from connection_pool, it uses same connection for a url and when there are many request for a same url it returns wrong response for some query params.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.