Closed boboli closed 9 years ago
interesting. that's not a use-case i've come across myself so hadn't though about addressing.
is there a specific reason you're creating a bunch of sessions and not a single long-lived FuturesSession to be shared across time. unless each one needs to be its own distinct session (cookies etc.) then you would probably be better off creating a single FuturesSession with a larger number of max_workers and just let it live for the life of the script.
i'm not completely opposed to FuturesSession implementing the context manager protocol, just want to make sure that it's needed first.
I think cleaning up the thread pool resources is akin to calling .close()
on file objects when we're done with them. And open()
follows the context manager pattern to give you a convenient wrapper that automatically calls the .close()
, so that's where I got the idea from.
I agree that for my script it's better to use just a single FuturesSession
, but I feel it's good practice to clean up resources regardless.
feel free to pr that change to FuturesSession and ideally provide an example in the README. i assume the example should catch and use the session.
with FuturesSession(max_workers=2) as session:
session...
ideally there'd be some sort of unit testing of the functionality. perhaps there's a way to tell if the executor has been shutdown correctly.
+1. Seems useful when chunking large numbers of requests (I'm doing 5000 per FuturesSession)
happy to accept patches w/tests. otherwise i'll try and get to it in an upcoming weekend.
Heh I was dragging my feet on the PR because of the difficulty of writing a proper unit test. I've investigated the concurrent.futures
module, and there's only 2 ways I can think of to determine if the executor has been shutdown:
executor._shutdown
which is a private field on ThreadPoolExecutor
(https://hg.python.org/cpython/file/3.2/Lib/concurrent/futures/thread.py#l125). Feels really icky to rely on private API.RuntimeError
will be raised if we try to use the ThreadPoolExectuor
again: (https://docs.python.org/3.2/library/concurrent.futures.html#concurrent.futures.Executor.shutdown): "Calls to Executor.submit() and Executor.map() made after shutdown will raise RuntimeError."Option 2 sounds slightly more proper but still icky in that it's not directly asserting what we intended, but a side effect.
Lemme know which option sounds better and I can try to do a PR with it.
another option might be to monkey patch executor.shutdown in the unit test and replace it with something that sets a flag and calls the original.
or slightly cleaner, inherit from FuturesSession and override exit and set a flag that can be checked there.
definitely a tough thing to test, that it shut down as designed. i guess the most important part to test is that the object functions in the with context correctly. that it calls exit is nice to test, but not critical.
When using
FuturesSession
for a long-running web scraper script, I've noticed a memory leak due to the fact that I wasn't cleaning up theThreadPoolExecutor
s that were created by the manyFuturesSession(max_workers=blah)
calls I was making.I fixed the issue by writing a contextmanager that cleaned up my executor when exiting:
This feels a bit slimy since I'm using the internal(?)
self.executor
reference. I also realize that theshutdown()
will block until allFuture
s are done, but I feel this is acceptable for many use cases.An alternative I've considered is having
FuturesSession
implement the context manager protocol with__enter__()
and__exit__()
so we can directly use it in a with statement. This would be similar to howopen()
works:Does this sound reasonable?