Open aeros opened 4 years ago
Based on the following python-ideas thread: https://mail.python.org/archives/list/python-ideas@python.org/thread/LMTQ2AI6A7UXEFVHRGHKWD33H24FGM6G/#ICJKHZ4BPIUMOPIT2TDTBIW2EH4CPNCP.
In the above ML thread, the author proposed adding a new cf.SerialExecutor class, which seems to be not a great fit for the standard library (based on the current state of the discussion, as of writing this). But, Guido mentioned the following:
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
Specifically, the OPs proposal should be reasonably possible to implement (either just locally for themselves or a potential PyPI package) with a few minor additions to cf.Future's public API:
1) Add a means of *publicly* accessing the future's state (future._state) without going through the internal condition's RLock.
This would allow the developer to implement their own condition or other synchronization primitive to access the state of the future. IMO, this would best be implemented as a separate future.state()
and future.set_state()
.
2) Add a means of *publicly* accessing the future's result (future._result) without going through the internal condition's RLock.
This would be similar to the above, but since there's already a future.result()
and future.set_result()
, I think it would be best implemented as an optional sync parameter that defaults to True. When set to False, it directly accesses future._result without the condition; when set to True, it has the current behavior.
3) Add public global constants for the different possible future states: PENDING, RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.
This would be useful to serve as a template of possible future states for custom implementations. I also find that fut.set_state(cf.RUNNING)
looks better than fut.state("running")
from an API design perspective.
Optional addition: To make fut.state()
and fut.set_state()
more useful for general purposes, it could have a single sync boolean parameter (feel free to bikeshed over the name), which changes whether it directly accesses future._state or does so safely through the condition. Presumably, the documentation would explicitly state that with sync=False, the future's state will not be synchronized across separate threads or processes. This would also allow it to have the same API as future.result()
and future.set_result()
.
Also, as for my own personal motivation in expanding upon the public API for cf.Future, I've found that directly accessing the state of the future can be incredibly useful for debugging purposes. I made significant use of it while implementing the new *cancel_futures* parameter for executor.shutdown(). But, since future._state is a private member, there's no guarantee that it will continue to behave the same or that it can be relied upon in the long-term. This may not be a huge concern for quick debugging sessions, but could easily result in breakage when used in logging or unit tests.
I'll leave it to Brian and/or Antoine to review this. Good luck!
Upon further consideration and based on recent developments in the python-ideas thread (mostly feedback from Antoine), I've decided to reduce the scope of this issue to remove future.set_state() and the *sync* parameters.
This leaves just future.state() and having the states as publicly accessible module-level constants.
This leaves just future.state() and having the states as publicly accessible module-level constants.
After reading over the python-ideas again and having some time to reflect, I think we can start with just adding future.state(), as that had the most value. Since future.state() is primarily intended for debugging/informational purposes as an approximation (similar to queue.qsize()
) rather than something to be consistently relied upon, I don't see a strong practical use case for returning the states as enums (instead of a string).
If we consider adding a means to directly modify the state of the future in the future or providing an option to safely read the state of the future (through its RLock) later down the road, it may be worth considering. But not at the moment, IMO.
I'll update the name of the issue accordingly, and should have time to open a PR in the next few days. It should be rather straightforward, with the main emphasis being on the documentation to ensure that it clearly communicates the purpose of future.state(); so that users don't assume it's anything more than an approximation.
I’m a bit disappointed, since it looks like this won’t allow implementing the OP’s classes without using private APIs. The debugging and loggin use cases aren’t very compelling to me.
I’m a bit disappointed, since it looks like this won’t allow implementing the OP’s classes without using private APIs. The debugging and loggin use cases aren’t very compelling to me.
Yeah, I had every intention when I initially proposed the idea on the python-ideas thread to provide an extensive means of implementing custom Future and Executor classes, but Antoine brought up a very valid concern about users shooting themselves in the foot with it:
I'm much more lukewarm on set_state(). How hard is it to reimplement one's own Future if someone wants a different implementation? By allowing people to change the future's internal state, we're also giving them a (small) gun to shoot themselves with.
In terms of a cost-benefit analysis, I'd imagine that it's going to be a rather small portion of the concurrent.futures users that would actually have a genuine use case for implementing their own custom Future or Executor.
He was still approving of a future.state()
though, which is why I considered implementing it alone:
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Although not nearly as significant (or interesting), I've personally encountered situations where it would be useful for logging purposes to be able to read the approximate state of the future. But if the consensus ends up being that it's not useful enough to justify adding compared to the original purpose of the issue, I would certainly understand.
But note my response to Antoine at the time, mentioning that implementing ‘as_completed()’ is impossible that way. Antoine then backtracked somewhat.
But note my response to Antoine at the time, mentioning that implementing ‘as_completed()’ is impossible that way. Antoine then backtracked somewhat.
Ah, I had seen that but for some reason hadn't considered that Antoine might have also changed his stance on a means of modifying the state of the future:
> It's actually really hard to implement your own > Future class that works > well with concurrent.futures.as_completed() -- this is basically what > complicated the OP's implementation. Maybe it would be useful to look into > a protocol to allow alternative Future implementations to hook into that?
Ah, I understand the reasons then. Ok, it does sound useful to explore the space of solutions. But let's decouple it from simply querying the current Future state.
In that case, I'll revert the title and leave the issue open for further discussion; but I'll hold off on any PRs until we have some consensus regarding the direction we want to go in with regards to potential new future protocols. Apologies for the misunderstanding, thanks for clarifying. :)
I'd also be interested in hearing Brian Quinlain's thoughts on the matter.
I'll try to take a look at this before the end of the week, but I'm currently swamped with other life stuff :-(
it seems this feature would interest Dask team. https://github.com/dask/distributed/issues/3695
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/aeros' closed_at = None created_at =
labels = ['type-feature', 'library', '3.9']
title = "Expand concurrent.futures.Future's public API"
updated_at =
user = 'https://github.com/aeros'
```
bugs.python.org fields:
```python
activity =
actor = 'jakirkham'
assignee = 'aeros'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'aeros'
dependencies = []
files = []
hgrepos = []
issue_num = 39645
keywords = []
message_count = 10.0
messages = ['362047', '362110', '362163', '363114', '363117', '363119', '363121', '363124', '363205', '368440']
nosy_count = 7.0
nosy_names = ['gvanrossum', 'bquinlan', 'pitrou', 'Big Stone', 'aeros', 'iforapsy', 'jakirkham']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue39645'
versions = ['Python 3.9']
```