Bug: new `submit` function makes it impossible to call a function that has a `timeout` argument

noxdafox / pebble

Multi threading and processing eye-candy.

GNU Lesser General Public License v3.0

525 stars 49 forks source link

Bug: new `submit` function makes it impossible to call a function that has a `timeout` argument #111

Closed PLPeeters closed 1 year ago

PLPeeters commented 1 year ago

Given the deprecated schedule now calls submit, Pebble is totally broken for cases like this.

Here's a minimal example:

from pebble import ProcessPool

def test(timeout):
    return

kwargs = {
    'timeout': 10
}

ProcessPool().submit(test, 10, **kwargs)

Leads to:

TypeError: submit() got multiple values for argument 'timeout'

My suggestion would be to rename the timeout argument to _timeout or even _pebble_process_timeout to avoid name clashes.

I'll gladly do a PR if you want me to.

noxdafox commented 1 year ago

This is why I don't like concurrent.futures API. They look simple&dandy but they lead to situations like these where namespaces get polluted.

I don't think coming up with a custom enough keyword is the correct solution. I think that there should be a clear distinction between what you send to the Pool and what to your scheduled function.

Another solution I was entertaining was to add the timeout as a Pool attribute but it makes the logic cumbersome when you need to specify different timeouts per each call.

# This is also non-thread safe
pool.timeout = 5
pool.submit(fn, *args, **kwargs)

I think I will re-instate the schedule function as the recommended one (now deprecated) and expose the submit one just to support the asyncio.loop.run_in_executor(fn, TIMEOUT, *args) Use Case.

This is the second issue (#110) which clearly means the new API is not good enough.

PLPeeters commented 1 year ago

@noxdafox Actually, I just remembered position-only parameters, which solves this case:

def test(timeout, /, **kwargs):
    print(timeout)
    print(kwargs)

test(3, timeout=10)

Prints:

3
{'timeout': 10}

noxdafox commented 1 year ago

Yes, that would be the correct way to handle your case if you have Python >= 3.9.

Yet it is not backwards compatible and does not solve the issue highlighted in #110 where if you don't need to specify a timeout, you need to schedule your function as submit(fn, None, 'foo', 'bar', 'baz').

The ideal solution would be to have pattern-matching based function declarations such as Erlang or Elixir languages allow.

def submit(fn, /, *args, **kwargs):
    schedule_without_timeout()

def submit(fn, timeout, /, *args, **kwargs):
    schedule_with_timeout()

PLPeeters commented 1 year ago

3.8 actually, but your point still stands; compatibility matters so it's not fixable that way for now.

Perhaps it could be argued that since 3.7 is reaching EOL in 8 months it might make sense to implement it this way in 8 months? (and bump the major version with it)

It indeed still wouldn't solve #110, but I would say that's a totally different issue. The only way I can see that fixes both issues and keeps a call-specific timeout for submit calls is a specific kwarg that defaults to None, but as you said that isn't the cleanest either.

victoryhb commented 1 year ago

I agree with @PLPeeters and much prefer adding a pebble-specific keyword argument (e.g. pebble_timeout) to lower the barriers to entry for people familiar with concurrent.futures (though I see such compatibility is not Pebble's focus) and simplify the API (instead of keeping both submit and schedule).

noxdafox commented 1 year ago

Release 5.0.3 reinstates schedule in place of submit.

submit is left to ensure compatibility with asyncio.loop.run_in_executor.