python / cpython

The Python programming language
https://www.python.org
Other
62.32k stars 29.93k forks source link

Add deferred single-threaded/fake executor to concurrent.futures #80576

Open 61d821a0-2bc0-42f9-a54e-4d17e1253407 opened 5 years ago

61d821a0-2bc0-42f9-a54e-4d17e1253407 commented 5 years ago
BPO 36395
Nosy @brianquinlan, @pitrou

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', '3.7'] title = 'Add deferred single-threaded/fake executor to concurrent.futures' updated_at = user = 'https://bugs.python.org/BrianMcCutchon' ``` bugs.python.org fields: ```python activity = actor = 'santagada' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'Brian McCutchon' dependencies = [] files = [] hgrepos = [] issue_num = 36395 keywords = [] message_count = 9.0 messages = ['338576', '341616', '341624', '341625', '341660', '341740', '341797', '341890', '369603'] nosy_count = 4.0 nosy_names = ['bquinlan', 'pitrou', 'santagada', 'Brian McCutchon'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue36395' versions = ['Python 3.7'] ```

61d821a0-2bc0-42f9-a54e-4d17e1253407 commented 5 years ago

Currently, it is possible to make a basic single-threaded executor for unit testing:

class FakeExecutor(futures.Executor):

  def submit(self, f, *args, **kwargs):
    future = futures.Future()
    future.set_result(f(*args, **kwargs))
    return future

  def shutdown(self, wait=True):
    pass

However, this evaluates the provided function eagerly, which may be undesirable for tests. It prevents the tests from catching a whole class of errors (those where the caller forgot to call .result() on a future that is only desirable for its side-effects). It would be great to have an Executor implementation where the function is only called when .result() is called so tests can catch those errors.

I might add that, while future.setresult is documented as being supported for unit tests, a comment in the CPython source says that Future.\_init__() "Should not be called by clients" (https://github.com/python/cpython/blob/master/Lib/concurrent/futures/_base.py#L317), suggesting that even the above code is unsupported and leaving me wondering how I should test future-heavy code without using mock.patch on everything.

------ Alternatives that don't work ------

One might think it possible to create a FakeFuture to do this:

class FakeFuture(object):

  def __init__(self, to_invoke):
    self._to_invoke = to_invoke

  def result(self, timeout=None):
    return self._to_invoke()

However, futures.wait is not happy with this:

futures.wait([FakeFuture(lambda x: 1)]) # AttributeError: 'FakeFuture' object has no attribute '_condition'

If FakeFuture is made to extend futures.Future, the above line instead hangs:

class FakeFuture(futures.Future):

  def __init__(self, to_invoke):
    super(FakeFuture, self).__init__()
    self._to_invoke = to_invoke

  def result(self, timeout=None):
    return self._to_invoke()

I feel like I shouldn't have to patch out wait() in order to get good unit tests.

brianquinlan commented 5 years ago

Hey Brian, why can't you use threads in your unit tests? Are you worried about non-determinism or resource usage? Could you make a ThreadPoolExecutor with a single worker?

61d821a0-2bc0-42f9-a54e-4d17e1253407 commented 5 years ago

Mostly nondeterminism. It seems like creating a ThreadPoolExecutor with one worker could still be nondeterministic, as there are two threads: the main thread and the worker thread. It gets worse if multiple executors are needed.

Another option would be to design and document futures.Executor to be extended so that I can make my own fake executor.

brianquinlan commented 5 years ago

Do you have a example that you could share?

I can't think of any other fakes in the standard library and I'm hesitant to be the person who adds the first one ;-)

61d821a0-2bc0-42f9-a54e-4d17e1253407 commented 5 years ago

I understand your hesitation to add a fake. Would it be better to make it possible to subclass Executor so that a third party implementation of this can be developed?

As for an example, here is an example of nondeterminism when using a ThreadPoolExecutor with a single worker. It sometimes prints "False" and sometimes "True" on my machine.

from concurrent import futures
import time

complete = False

def complete_eventually():
  global complete
  for _ in range(150000):
    pass
  complete = True

with futures.ThreadPoolExecutor(max_workers=1) as pool:
  pool.submit(complete_eventually)
  print(complete)
brianquinlan commented 5 years ago

Hey Brian,

I understand the non-determinism. I was wondering if you had a non-theoretical example i.e. some case where the non-determinism had impacted a real test that you wrote?

61d821a0-2bc0-42f9-a54e-4d17e1253407 commented 5 years ago

No, I do not have such an example, as most of my tests try to fake the executors.

brianquinlan commented 5 years ago

Brian, I was looking for an example where the current executor isn't sufficient for testing i.e. a useful test that would be difficult to write with a real executor but would be easier with a fake.

Maybe you have such an example from your tests?

e7b3ad16-d9a9-4f0a-822b-0ed344cc0313 commented 4 years ago

I have a single example:

Profiling. As most python profilers don't support threads or processes, it would be very convenient to have a in process executor in those cases.

tibbe commented 1 year ago

I have plenty of examples from a server we have that talk to AWS DynamoDB. A common pattern for doing two fetches is something like this:

def handler():
    # Run on another thread to reduce overall latency.
    item1_fut = my_executor.submit(client.get_item, key1)
    item2 = client.get_item(key2)
    item1 = item1_fut.result()
    do_something(item1, item2)

def test_handler():
    my_mock.assert_called('get_item', key1)
    my_mock.assert_called('get_item', key2)
    handler()

Without using a fake executor mocking this is not possible (at least with the AWS Stubber mocking library, which is needed for DynamoDB) as the order the mock gets called is non-deterministic and there's no way with this particular mock to say any order is fine.

gwerbin commented 2 months ago

Debugging is a practical use case for a DummyExecutor. For example, I have a chunk of code that uses ThreadPoolExecutor, but I have a bug or some unusual behavior, and I want to run all code in the "main" thread to eliminate the possibility of race conditions, and facilitate step-through debugging. It would be convenient to just drop in a DummyExecutor in that case.

There's also a legitimate application at runtime. I might want to provide a --threads CLI option in my script, where the program runs entirely in a single thread if that option is omitted. Without the DummyExecutor, I need to write 2 different code paths, but with it I can just swap out the executors, e.g. executor = DummyExecutor() if num_threads is None else ThreadPoolExecutor(num_threads)