python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.18k stars 338 forks source link

High level API for accessing getsockname() / getpeername() #280

Open njsmith opened 7 years ago

njsmith commented 7 years ago

There needs to be some way to get at the information in getsockname and getpeername from the high-level Stream and Listener interfaces.

It should involve an await, at least on streams, to support the PROXY protocol.

What should we do with SSLStream/SSLListener? There's a conceptual problem where the stream they're talking about doesn't exactly have an address, it's "connect to the transport's address AND THEN wrap the thing in a SSLStream". But then, it's not like getsockname/getpeername exactly has any meaning for streams anyway -- like, if you have a connected socket it's useful to be able to ask it what the two halves of the TCP 5-tuple looks like, but you can't actually do anything with this, it doesn't satisfy any invariants. Except "this might be useful for forensics".

So maybe pragmatics beats purity and SSLStream should just proxy this through to the underlying transport, and we'll leave the user to figure out how to decipher it.

Maybe a SocketAddress object that has family and sockaddr fields, plus possibly host, port, path, etc., as appropriate for the particular address?

njsmith commented 7 years ago

Small wrinkle to watch out for with AF_UNIX sockets: https://github.com/python-trio/trio/issues/279#issuecomment-322114125

njsmith commented 6 years ago

We'll also want to think about this: https://bugs.python.org/issue32221

tl;dr: in some versions of python, getsockname (etc.) on IPv6 addresses resolves any scopeid and appends it to the first element of the tuple, but this turns out to be super slow (and this is on a code path that gets called in recvfrom for UDP packets), so they switched to not doing this. The scopeid information is still available at the end of the 4-tuple, but at the least this means cross-version differences in how to interpret the 0th element in the tuple.

This also caused issues for twisted: https://twistedmatrix.com/trac/ticket/9449#9449

njsmith commented 4 years ago

Maybe something like:

# Empty type for docs and to indicate intention
class Address(ABC):
   pass

@attr.s(frozen=True, slots=True)
class SocketAddress(Address):
    socket_family: socket.AddressFamily
    socket_address: Any  # the same types as returned by getsockname etc.

    @property
    def ip(self) -> str:
        if self.socket_family in [AF_INET, AF_INET6]:
            return self.socket_address[0]
        raise AttributeError("ip")

    @property
    def port(self) -> int:
        if self.socket_family in [AF_INET, AF_INET6]:
            return self.socket_address[1]
        raise AttributeError("port")

    @property
    def path(self) -> Optional[str]:  # or is it bytes?
        if self.socket_family == AF_UNIX:
            return self.socket_address
        raise AttributeError("path")

    def __str__(self):
        if self.socket_family == AF_INET:
            return f"{self.ip}:{self.port}"
        if self.socket_family == AF_INET6:
            return f"[{self.ip}]:{self.port}"  # FIXME: what about scopeid and the other thing?
       if self.socket_family == AF_UNIX:
            return f"unix://{self.path}"
       ...

class Stream:
    async def get_local_name(self) -> Optional[Address]: ...
    async def get_remote_name(self) -> Optional[Address]: ...
Tronic commented 4 years ago

I'd love to see this implemented. Paths are definitely str in Python. Just be sure to use errors="surrogateescape" whenever you encode or decode them.

altendky commented 3 years ago

Just passing by because of the AddressFamily reference (deciding how to placate pylint) but "paths" aren't always strings and there are tools that I think are for handling the decoding/encoding. https://docs.python.org/3.10/library/os.html#os.fsencode

Tronic commented 3 years ago

@altendky I gather that paths are always str in Python, even if they may contain arbitrary bytes rather than valid Unicode on the filesystem, and that fsencode just handles this with the surrogateescape mode I mentioned in the previous message. But agreed, using the helper function is still more appropriate for readable code.

altendky commented 3 years ago

I presume there are functions that always return paths as str but "paths" (a fairly general concept, admittedly) are not exclusively str nor even str based. It seems that surrogateescape is only the default on non-Windows platforms.

os.PathLike protocol

The method should only return a str or bytes object, with the preference being for str.

open() accepts path-likes which includes bytes.

file is a path-like object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped.

os.fsencode() and os.fsdecode() link to filesystem encoding and error handler and that to filesystem_errors

On Windows: use "surrogatepass" by default, or "replace" if legacy_windows_fs_encoding of PyPreConfig is non-zero.

On other platforms: use "surrogateescape" by default.

Tronic commented 3 years ago

Ok, I stand corrected.