python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.1k stars 332 forks source link

Handling _SocketType and SocketType #2720

Closed jakkdl closed 11 months ago

jakkdl commented 1 year ago

SocketType is currently an empty type and what's meant for end-users to inherit from to show compatibility with trio sockets, but _SocketType (which inherits from SocketType is what's actually used internally and has all the methods needed to get code to actually pass type checking.

So we probably want to merge these one way or another, I started trying to tackle it in #2705 but all my ideas hit problems and will likely have some effect on user code that inherits from either of them, so I think it deserves it's own issue + PR.

In tests we have three different FakeSocket classes inheriting from SocketType: test_highlevel_socket.py, test_highlevel_open_tcp_stream.py and test_highlevel_open_tcp_listeners.py

Options include:

  1. Make SocketType an abstract base class with abstract methods. Will break any code that has methods inheriting from it and not defined - so would need to only mark the methods that are absolutely required to be abstract, and others could instead raise NotImplementedError. *. Will almost certainly raise a lot of type checking errors.
  2. Remove _SocketType, moving it all into SocketType. I think it's impossible to avoid this being a breaking API change.
  3. Don't make any runtime changes at all, but during type checking assign SocketType = _SocketType. *. will raise a ton of type checking errors, might be possible to reduce the number with some effort.
  4. Rename _SocketType something like InternalTrioSocketType and have that be the class used in signatures that require some methods to be available.
  5. ...
TeamSpen210 commented 1 year ago

I think before deciding how to deal with that, it'd be good to try and categorise the methods - which should an implementation always need to have defined, are there groups which should be defined together? Also how many could have a default implementation, beyond raise NotImplementedError?

A 5th option might be to use a bunch of protocols, though there'd be a little bit of a runtime cost. Make SocketType a Protocol with the most basic methods, then have some additional protocols inheriting from that for any other "group" of methods. Code could use either @runtime_checkable or a TypeGuard using hasattr() to detect if it applies. That won't help with verifying that they're valid for set_custom_socket_factory though.

A5rocks commented 1 year ago

I think 1. is nice, because that's essentially the only method that I can think of that is both palatable and has a solid deprecation path (i.e. make certain methods abstract only at type checking time + check that they're non-abstract at initialization time, otherwise warn).

jakkdl commented 1 year ago

Ah, that sounds like a good solution and relatively straightforward. It will lead to some minor API breakages if people are doing very fancy stuff in their classes inheriting from SocketType, but that shouldn't be overly common?

The best way of categorizing methods is probably just to start writing a PR to get an overview. If anybody wants to take a stab feel free, otherwise I'll give it a go.

Fuyukai commented 1 year ago

Option 5: Vaporise SocketType and stop exposing raw BSD sockets. This would have to be done for uring support anyway.

A5rocks commented 1 year ago

Huh, what kind of interface changes are we looking at for that sort of removal? I'm not too familiar with what trio offers w/r/t sockets.

njsmith commented 1 year ago

Why would io_uring force us to stop exposing send/recv/etc? There are ring ops for all those, even if we do want to avoid the readiness APIs for some reason.

On Sun, Aug 6, 2023, 17:22 EXPLOSION @.***> wrote:

Huh, what kind of interface changes are we looking at for that sort of removal? I'm not too familiar with what trio offers w/r/t sockets.

— Reply to this email directly, view it on GitHub https://github.com/python-trio/trio/issues/2720#issuecomment-1667018760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEU42ARXX523PB5LCJR373XUAYMBANCNFSM6AAAAAA24NREGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Fuyukai commented 1 year ago

BSD sockets is more than just send/recv, it has the awful type-punning socket address and socket option APIs. But aside from that, uring linked requests is intrinsically incompatible with the socket.socket API of single-call waits. If you extend it, it's no longer a stdlib socket inside a wolf's suit, so why bother pretending to be one?

If we wanted direct sockets (a good feature), we'd then have to use the io_uring calls for sockopt, sockaddr, etc, which means either copying all the validation logic over from the terrible stringly-typed socket.socket API so that somebody doeesn't pass in a too small buffer and segfaults. Or, better, make a proper strongly-typed API that can be statically typed to avoid these issues but is once again incompatible with stdlib sockets (so we should stop pretending to be one).

njsmith commented 1 year ago

(We're not going to remove the current socket api, even if we do add fancy io_uring related features on top of it in the future, so for this issue it doesn't matter too much. @Fuyukai do you want to open a separate issue for possible new socket operations to expose via io_uring? E.g. you can't express a multishot accept with the current API, but we can certainly add a multishot accept API. idk what a shutdown->close is tho. And fyi it's totally legal to have a single socket that's both registered for direct io and has a regular fd for regular syscalls, if that's more convenient.)

Anyway for SocketType, I think we want to basically make it

class SocketType:
    def __init__(self, [appropriate types here]):
        raise NotImplementedError

    def recv(self) -> bytes:
        raise NotImplementedError
# etc.

That makes it legible to type checkers, while keep all the current runtime behavior the same.

jakkdl commented 1 year ago

I think before deciding how to deal with that, it'd be good to try and categorise the methods - which should an implementation always need to have defined, are there groups which should be defined together? Also how many could have a default implementation, beyond raise NotImplementedError?

I think SocketType needs to have everything, unless we overhaul how SocketFactory works, since the return type of all methods will be SocketType, and _SocketType will never be seen in the public API, so if they ever want to call e.g. recvmsg_into then it has to be part of the public API of SocketType.

Default implementations could be made for all the pure wrapper methods though, if they check that the user-made SocketType does have an underlying stdlib socket. Something like

class SocketType:
  def __init__(self):
    if type(self) == SocketType:
      raise TypeError("abstract class, use trio.socket.socket...")
    self._socket: _stdlib_socket.socket | None = None

  def detach(self) -> int:
    if self._sock is None:
      raise NotImplementedError("No default implementation unless self._socket is initialized") 
    return self._socket.detach()
TeamSpen210 commented 1 year ago

For that case, what we do is make a protocol that has _socket defined, and annotate self with that. Then type checkers will complain if any of the defaults are used but that attribute is not present.

jakkdl commented 1 year ago

For that case, what we do is make a protocol that has _socket defined, and annotate self with that. Then type checkers will complain if any of the defaults are used but that attribute is not present.

@TeamSpen210 just slapping on a HasSocket protocol to the self argument is turning out to be far from trivial, due to lack of Intersection it warrants creating another type SocketTypeWithSocket which looks like it ends up being the type used pretty much everywhere in the internal code, since out of 30 methods on SocketType only 7 of them are more than a simple wrapper and left with a NotImplementedError, and the others all require HasSocket. Pretty much all user code is going to need to be typed as using SocketTypeWithSocket (or _SocketType).

If going in this direction I think it's probably better to just go all the way and fully merge the types, or expose _SocketType as a public type that can be specified.

jakkdl commented 1 year ago

That makes it legible to type checkers, while keep all the current runtime behavior the same.

FTR: The raise NotImplementedError certainly doesn't make all runtime behavior the same. Any code doing dynamic inspection on socket types with hasattr/getattr/dir, which isn't that weird given that the current trio tests already do that for platform differences, could get messed up. Also any code that does multiple inheritance and uses super() calls will now raise NotImplementedError.

jakkdl commented 1 year ago

Ah, I suppose there's always

class SocketType:
    if TYPE_CHECKING:
        def recv(self, ...) -> bytes:
            ...
        def fileno(self) -> int:
            ...

if we want to guarantee no impact on runtime.

njsmith commented 1 year ago

I'm having trouble following what people are trying to accomplish here – what's wrong with my proposal above?

On Mon, Aug 28, 2023, 07:37 John Litborn @.***> wrote:

Ah, I suppose there's always

class Socket if TYPE_CHECKING: def recv(self, ...) -> bytes: ... def fileno(self) -> int: ...

if we want to guarantee no impact on runtime.

— Reply to this email directly, view it on GitHub https://github.com/python-trio/trio/issues/2720#issuecomment-1695817617, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEU42BIMWFI5IGLYTLWYKTXXSULRANCNFSM6AAAAAA24NREGA . You are receiving this because you commented.Message ID: @.***>

jakkdl commented 1 year ago

I'm having trouble following what people are trying to accomplish here – what's wrong with my proposal above?

Your proposal can affect runtime behaviour, though admittedly only in weird cases. But I mostly just see no upside of raise RuntimeError vs hiding behind TYPE_CHECKING, unless you specifically want to catch people doing weird things with SocketType - which might make sense if long-term plan is to make SocketType abstract, or merging the two types.

Going into protocols and/or default implementations seemed like a good idea, but is turning out to be really messy and I'm not going to bother trying that.

njsmith commented 1 year ago

Conceptually SocketType is an ABC. It's just that since the API is huge and the main use case for implementing it is for tests, I bowed to practicality and made all the abstract methods optional -- the socket API is huge and any given user only touches a small fraction of it, and when you're writing a test fake, there's no point in making you fill in exotic methods you don't use. Plus, if you accidentally leave out an important method, the exact error doesn't matter so long as you get some error and fix your test; these aren't errors that you're catching and handling at runtime. And since this was before static type checking was a thing, I didn't bother typing in all the optional abstract methods, since all it would do is convert AttributeError into NotImplementedError and who cares?

Anyway that's the background.

Agreed that the difference between raise NotImplementedError and if TYPE_CHECKING is pretty minimal; I was more confused at how we were getting onto discussions about abstract protocols with _socket attributes and stuff :-).

I think I do have a weak preference for raise NotImplementedError over if TYPE_CHECKING, simply because having the same code for runtime and static checking is one less thing you have to think about when figuring out what's going on, and in this case I don't think the runtime difference between AttributeError vs NotImplementedError is important or worth preserving -- arguably it should have been NotImplementedError all along, just I was too lazy to bother :-) But either seems workable.