ross / requests-futures

Asynchronous Python HTTP Requests for Humans using Futures
Other
2.11k stars 152 forks source link

'FuturesSession' object has no attribute 'session' after copy/pickle #98

Closed schlegelp closed 2 years ago

schlegelp commented 4 years ago

Hi. First off: love your library, makes my life so much easier!

I just ran into this error after passing a FuturesSession through a multiprocessing.Pool. This is effectively just pickling + unpickling = copying the FuturesSession which I can reproduce with these minimal examples:

>>> from copy import copy
>>> from requests_futures.sessions import FuturesSession
>>> s = FuturesSession(max_workers=3)
>>> _ = s.get('http://httpbin.org/get')  # works 
>>> s2 = copy(s)
>>> _ = s2.get('http://httpbin.org/get') 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-2aa259e9dd0a> in <module>
----> 1 _ = s2.get('http://httpbin.org/get')

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/requests_futures/sessions.py in get(self, url, **kwargs)
    118         :rtype : concurrent.futures.Future
    119         """
--> 120         return super(FuturesSession, self).get(url, **kwargs)
    121 
    122     def options(self, url, **kwargs):

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/requests/sessions.py in get(self, url, **kwargs)
    541 
    542         kwargs.setdefault('allow_redirects', True)
--> 543         return self.request('GET', url, **kwargs)
    544 
    545     def options(self, url, **kwargs):

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/requests_futures/sessions.py in request(self, *args, **kwargs)
     83         :rtype : concurrent.futures.Future
     84         """
---> 85         if self.session:
     86             func = self.session.request
     87         else:

AttributeError: 'FuturesSession' object has no attribute 'session'

For completeness, here's an example with pickling

>>> import pickle
>>> from requests_futures.sessions import FuturesSession
>>> s = FuturesSession(max_workers=3)
>>> s2 = pickle.loads(pickle.dumps(s))
>>> _ = s2.get('http://httpbin.org/get') 

Basically the .session doesn't make it into the copy. I can't quite tell whether anything changed but I swear this used to work in the past. Any ideas?

This is with requests=2.24.0 and requests-futures=1.0.0

schlegelp commented 4 years ago

Quick update: Because FutureSession uses its parent's (requests.Session) __getstate__ method only the attributes in Session.__attrs__ are copied:

dict_keys(['headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify', 'cert', 'adapters', 'stream', 'trust_env', 'max_redirects'])

So it's not just .session that's missing after copy/pickle but also ._owned_executor and .executor. I guess this is easy to fix but maybe this behaviour is wanted/expected?

ross commented 4 years ago

Basically the .session doesn't make it into the copy. I can't quite tell whether anything changed but I swear this used to work in the past. Any ideas?

The ProcessPoolExecutor stuff has been a battle for years. It's broken numerous times and been addressed to get it working for a bit longer. I believe it's been due to changes in the requests session object and not the underlying bulitin parts of python. Tl;DR seems to be that requests doesn't appear to take into account the use case of passing things between processes and so makes breaking changes at times. I've always stuck with ThreadPoolExecutor as a result, though there are definitely cases I can imagine where ProcessPoolExecutor would be useful.

Unfortunately I don't have the bandwidth to dig into this atm, but patches welcome if you make any progress with it.

schlegelp commented 4 years ago

Happy to have a look. Minimally I could just monkey patch by adding session, _owned_executor and executor to the __attrs__ inherited from Session. That would make FuturesSession relatively robust against upstream changes in Session and certainly fix things in my scenario. Not sure if I'm missing any potentially negative consequences for other use cases though?

I guess one downside is that it would mean that the exectutor (whether Thread or Process) would be silently copied too. In theory, this could lead to some unexpected behaviours with e.g. with the max_workers limit. Still better than breaking on copy?

tmerrittsmith commented 2 years ago

Hey @schlegelp, did you get anywhere with this? Running in to the same issue!

schlegelp commented 2 years ago

This has fallen totally off my radar to be honest. I don't even remember in which project I ran into this issue.

tmerrittsmith commented 2 years ago

No problem! We worked around it in the end.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.