Closed bloomsa closed 2 years ago
looks like I have to fix some things up based on failing tests, I will work on that soon
@bloomsa - this is looking great so far - thank you!
I could use some help with the typing on ConductorClusterProcessProxy. Mainly around the Conductor API responses as I am not familiar with it nor could I find documentation on it that looked right for the APIs being hit in these methods.
Yeah, the ConductorClusterProcessProxy
was a contribution and I suppose we could ping that contributor, but I think you'll find it extremely similar to the YarnClusterProcessProxy
. Do you have specific examples of where responses are unclear? Since my understanding is that it's just a REST API, the responses are probably JSON (dict
), but I suppose str
could be used as well.
but not methods such as poll, launch_process, etc. I did this because the implementation classes may use different types for arguments or return object than the base abstract classes.
Hmm. I think it would be good where types diverge from the base class. I would expect them to be the same and perhaps that's a sign of a "smell" we should dig into. Again, any specific examples would be great.
Thanks again for contributing this - it will be extremely helpful to other contributors.
@kevin-bates
Yeah, the
ConductorClusterProcessProxy
was a contribution and I suppose we could ping that contributor, but I think you'll find it extremely similar to theYarnClusterProcessProxy
. Do you have specific examples of where responses are unclear? Since my understanding is that it's just a REST API, the responses are probably JSON (dict
), but I supposestr
could be used as well.
In other processProxy classes like for Docker I was using the actual Docker API model classes, so trying to do that for Conductor was leading me to some dead ends. I will just go with dict
or str
for these for now
Hmm. I think it would be good where types diverge from the base class. I would expect them to be the same and perhaps that's a sign of a "smell" we should dig into. Again, any specific examples would be great.
Looking again, the only place I see this happening is in launch_process
where the implementation classes all return self
while the abstract class returns nothing (KubernetesProcessProxy vs abstract base class). Thinking more about it now, is it because the abstract class's implementation is there to be called by the subclasses using super...
, as the comments mention?
Hi @bloomsa - good points.
In other processProxy classes like for Docker I was using the actual Docker API model classes, so trying to do that for Conductor was leading me to some dead ends. I will just go with dict or str for these for now
Yeah, I'm having similar issues finding the correct REST API docs - none of which appear to have endpoints like v1/applications
or v1/submissions
. I think the best of approach would be to go with dict
and str
for now.
@kjdoyle (I hope you're doing well) - would you happen to know where the corresponding REST API docs for the ConductorClusterProcessProxy
happen to reside?
Looking again, the only place I see this happening is in launch_process where the implementation classes all return self while the abstract class returns nothing (KubernetesProcessProxy vs abstract base class). Thinking more about it now, is it because the abstract class's implementation is there to be called by the subclasses using super..., as the comments mention?
While it's true that we require all subclasses to call super()
first, I don't think that should preclude us from returning self
from the various super()
calls - it will always be the same value. (This is one of the benefits of adding type hints - to clarify things like this.) As a result, since the signatures should match, it seems like launch_process
would wind up looking something like this...
async def launch_process(self, kernel_cmd: List[str], **kwargs: Optional[dict[str, Any]]) -> BaseProcessProxyABC:
I would prefer that each implementation of launch_process()
indicate that that class instance is returned, but I don't know if that will cause linting or IDEs heartburn. If they appear clean, then let's go with indicating the implementing class rather than BaseProcessProxyABC
.
Of course the recipient of launch_process()
(in RemoteKernelManager._launch_kernel()
) should be typed to receive BaseProcessProxyABC
. However, we probably need to be careful about this since the process-proxies masquerade as Popen
instances, so I'm not sure we'd even want to indicate the return type on RemoteKernelManager._launch_kernel()
- does that make sense?
While it's true that we require all subclasses to call
super()
first, I don't think that should preclude us from returningself
from the varioussuper()
calls - it will always be the same value. (This is one of the benefits of adding type hints - to clarify things like this.) As a result, since the signatures should match, it seems likelaunch_process
would wind up looking something like this...async def launch_process(self, kernel_cmd: List[str], **kwargs: Optional[dict[str, Any]]) -> BaseProcessProxyABC:
I would prefer that each implementation of
launch_process()
indicate that that class instance is returned, but I don't know if that will cause linting or IDEs heartburn. If they appear clean, then let's go with indicating the implementing class rather thanBaseProcessProxyABC
.
The BaseProcessProxyABC
abstract definition of launch_process
does not return self
, it returns None
(unless I am missing something in the code). The RemoteProcessProxy
also does not return anything, only when we get to the actual implementation classes is there a return value, which is the implementing class as you said.
On the topic of indicating that with type hints, I had to do that using Type["ProcessProxyImplementationClassXYZ"]
(see here). Looking into it more, I might be able to replace that with 'ProcessProxyImplementationClassXYZ'
, I'll give that a shot. As of current Python versions, I cannot simply state the return type as ProcessProxyImplementationClassXYZ
as the IDE/linter would complain. (ref)
Of course the recipient of
launch_process()
(inRemoteKernelManager._launch_kernel()
) should be typed to receiveBaseProcessProxyABC
. However, we probably need to be careful about this since the process-proxies masquerade asPopen
instances, so I'm not sure we'd even want to indicate the return type onRemoteKernelManager._launch_kernel()
- does that make sense? Yes that makes sense, the best way to go about that may be using type hinting (claiming that the processProxy is returned) in combination with a comment indicating the masquerade that is happening
The BaseProcessProxyABC abstract definition of launch_process does not return self, it returns None (unless I am missing something in the code). The RemoteProcessProxy also does not return anything, only when we get to the actual implementation classes is there a return value, which is the implementing class as you said.
Yes, you are correct. My comment prior to this...
I don't think that should preclude us from returning self from the various super() calls - it will always be the same value.
was suggesting that we should return self
from the base class onward. This way the type annotations are consistent throughout the class hierarchy.
The BaseProcessProxyABC abstract definition of launch_process does not return self, it returns None (unless I am missing something in the code). The RemoteProcessProxy also does not return anything, only when we get to the actual implementation classes is there a return value, which is the implementing class as you said.
Yes, you are correct. My comment prior to this...
I don't think that should preclude us from returning self from the various super() calls - it will always be the same value.
was suggesting that we should return
self
from the base class onward. This way the type annotations are consistent throughout the class hierarchy.
got it, I misread the original comment, sorry. I'd prefer to leave this PR as strictly adding type hints to existing functions without changing any implementations, but will defer to you on whether you want me to change that here or leave it for a future PR.
w.r.t. the failed builds, flake is failing due to the undefined name RemoteKernelManager
in the ProcessProxy type hints. Unfortunately when adding type hints, the class being referenced is imported, which is leading to circular imports in this case. I tried to resolve it by using the string 'RemoteKernelManager'
instead but it looks like it still doesn't work. I found this as a possible solution but it's a kludge imo :/
I suppose we might need to forgo type hints for the kernel_manager
variables.
Or capture the inputs (members of KM that are read) in the constructors and outputs (members of KM that are written) as returned from the methods. Not sure if that's feasible and would rather we just forgo hints on KM in process proxies since the latter could get ugly.
I see that the test suite has python 3.7 as the minimum tested version. If we force > 3.7 for the project that makes type hinting easier and more performant (see this S/O discussion) Without doing that, I think leaving the kernel_manager un-hinted is probably the easiest solution for now. I'm not sure it's possible to call out the kernel_manager members that are written/updated as the return type like you mentioned. If it is possible, I agree it could be ugly and confusing for people unfamiliar with the project
I agree we should not type-hint kernel_manager
in the process proxies over identifying its members and splitting those out.
Regarding python > 3.7, we need to support 3.7 until its EOL in June of 2023. However, the link you provide appears to say that these annotation improvements exist for version >= 3.7. But it also appears to address the references to a class within that class definition. Are you saying the from __future__ import annotations
would resolve the circular reference issue and we could use 'RemoteKernelManager'
as the annotation?
I agree we should not type-hint
kernel_manager
in the process proxies over identifying its members and splitting those out.Regarding python > 3.7, we need to support 3.7 until its EOL in June of 2023. However, the link you provide appears to say that these annotation improvements exist for version >= 3.7. But it also appears to address the references to a class within that class definition. Are you saying the
from __future__ import annotations
would resolve the circular reference issue and we could use'RemoteKernelManager'
as the annotation?
Sorry, I meant to type >= 3.7. I'm not 100% sure as I haven't done it before, but yes your understanding is the same as mine. from __future__ import annotations
works starting in 3.7 (and up). I think it will allow us to use 'RemoteKernelManager'
without the circular import. I'll give it a try
Pull Request for issue #851 A couple of notes on my process for making the type hints and an ask for some assistance:
_xyz
methods in the abstract classes but not methods such aspoll
,launch_process
, etc. I did this because the implementation classes may use different types for arguments or return object than the base abstract classes. Although I am open to changing this approach if it is not best practice. Thanks!