vshymanskyy / aiodns

A small async DNS client for MicroPython
MIT License
8 stars 1 forks source link

A design issue #3

Open peterhinch opened 5 days ago

peterhinch commented 5 days ago

The socket docs state:

The native socket address format of the socket module is an opaque data type returned by getaddrinfo function, which must be used to resolve textual address (including numeric addresses).

The example script re-iterates this:

You must use getaddrinfo() even for numeric addresses

This would seem to have a number of consequences.

  1. On some ports async getaddrinfo may need to provide this opaque data. In every case I've seen the synchronous getaddrinfo returns a dotted quad, but the docs are specific on this. A query to @dpgeorge may be needed to resolve this.
  2. Async getaddrinfr needs to handle dotted quads as input. Currently it fails if passed a dotted quad. The network module does this here.
  3. If async getaddrinfo is to follow this rule it must use sync getaddrinfo which is absurd. This may be another reason to use the network's own DNS server.

The reason for point 2 is as follows. An application (such as async MQTT) reads a config file containing a network address. This is either a dotted quad for a local server or a domain name for a server on the internet. Currently it calls getaddrinfo with the passed address and uses the opaque result to connect.

vshymanskyy commented 5 days ago

Implementing aiodns.getaddrinfo via a call to socket.getaddrinfo may actually be a good idea. In this case address will be resolved to IPv4 or IPv6 string, and then passed to getaddrinfo for final result. It requires further investigation.

From the practical standpoint, I've been able to use aiodns without issues on several ports. I'll update the example so it actually demonstrates the socket connection.

peterhinch commented 5 days ago

I agree: in every case I tested socket.getaddrinfo produced a dotted quad. However the docs are explicit on this point, and Damien is very thorough. I am convinced there is a good reason for the statement. It does need clarification for anyone writing a replacement.

The reason I am so interested in aiodns is that the need for it has been recognised for years and has been acknowledged by Damien. A few years ago I looked into writing an async DNS but the task defeated me. There may be scope for official adoption of aiodns if it can be made fully compliant with the synchronous version.

dpgeorge commented 5 days ago
  1. On some ports async getaddrinfo may need to provide this opaque data. In every case I've seen the synchronous getaddrinfo returns a dotted quad, but the docs are specific on this.

On the unix port, it returns an opaque binary data blob, eg:

MicroPython v1.24.0-preview.361.g43b05afdf on 2024-09-26; linux [GCC 14.1.1] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import socket
>>> socket.getaddrinfo('micropython.org', 80)
[(2, 1, 6, None, bytearray(b'\x02\x00\x00P\xb0:w\x1a\x00\x00\x00\x00\x00\x00\x00\x00')), (2, 2, 17, None, bytearray(b'\x02\x00\x00P\xb0:w\x1a\x00\x00\x00\x00\x00\x00\x00\x00')), (2, 3, 0, None, bytearray(b'\x02\x00\x00P\xb0:w\x1a\x00\x00\x00\x00\x00\x00\x00\x00'))]

It's anyway good practice to (a) always use getaddrinfo() even if you're just passing in an IP (v4 or v6) address; (b) treat the output of getaddrinfo() as opaque. Although CPython is more flexible here, this approach is compatible with CPython. That said, MicroPython does on most (all?) bare-metal ports allow you to pass in dotted-IP notation into, eg, socket.connect(), ie you don't need to use getaddrinfo() there if it's already a resolved IP address.

Anyway, @peterhinch 's point (2) above certainly needs to be followed.

Now that I revisit this issue, I remember that lwIP already includes a non-blocking DNS resolver. That's actually what MicroPython uses, it's just that, to make it CPython-compatible and blocking, it waits for a result before returning from getaddrinfo():

dns_gethostbyname_addrtype(..., lwip_getaddrinfo_cb, ...);
// lwip_getaddrinfo_cb will set a status when it's called

// in the meantime we just wait for that, ie block
            while (state.status == 0) {                              
                poll_sockets();                                 
            }                                                       

I think we should just expose this lwIP functionality in a non-blocking API to the Python code (because, eg, it uses lwIP's internal address cache). The hard question then is (as usual) how to design the API for this, what it looks like. There needs to be some C functions that start the query, poll for it to complete (in a way that can hook into asyncio) and then get the result.

vshymanskyy commented 5 days ago

That'd be great, however for now aiodns serves the purpose. We can easily ditch aiodns if/when:

For me, it just means i should probably settle on the minimalistic implementation (as it is), instead of investing too much time into a module that will get deprecated rather soon.

vshymanskyy commented 5 days ago

I believe I have implemented the fix for this issue, and tested it with the Unix port. Feel free to comment and/or continue the discussion. I'll leave it open for some time.

vshymanskyy commented 4 days ago

And yes, this means we're implementing aiodns.getaddrinfo via some calls to socket.getaddrinfo. But I think it should be fine :)

vshymanskyy commented 4 days ago

@peterhinch WDYT? is it a feasible solution?

peterhinch commented 4 days ago

I've looked at ad tested the code, but I'm unclear about its impact on blocking. Please can you clarify why its use does not result in blocking.

FWIW I have tooling (monitor) to measure blocking. It needs some setting up, but I'll run it when the code is stable.

vshymanskyy commented 4 days ago

Using aioprof is much more simple to setup, I'll try it. I assume that socket.getaddinfo doesn't need to perform any I/O to resolve the numerical addresses. Technically even 2+2*2 is a blocking operation..

peterhinch commented 4 days ago

Blocking

Technically even 2+2*2 is a blocking operation..

Quite: "nonblocking" is a convenient shorthand for minimally blocking. All tasks block in the interval between yielding to the scheduler. The acceptable duration of blocking depends on the application. It's blocking I/O which is the killer as blocking can be lengthy (in the limit the duration of a timeout).

I assume that socket.getaddinfo doesn't need to perform any I/O to resolve the numerical addresses.

If this is correct I'm sure blocking will be minimal (< 5ms max).

Default behaviour

My preference would be for the nonblocking getaddrinfo to return exactly the same results as the blocking version. This implies:

  1. Able to cope with numeric address.
  2. By default it uses only the network DNS server. Allow the user to add servers.

I have written a simple script that uses aiorepl to enable a comparison of the two versions:

--> await dns('google.com',80)
[(2, 1, 0, '', ('172.217.169.46', 80))]
[(2, 1, 0, '', ('142.250.187.238', 80)), (2, 1, 0, '', ('142.250.200.46', 80)), (2, 1, 0, '', ('172.217.169.46', 80))]

--> await dns('hinch.me.uk', 80)
[(2, 1, 0, '', ('217.160.0.225', 80))]
[(2, 1, 0, '', ('217.160.0.225', 80))]

--> await dns('192.168.0.10', 80)
[(2, 1, 0, '', ('192.168.0.10', 80))]
OSError: Failed to resolve 192.168.0.10
--> await dns('alflgh.com', 80)
Site alflgh.com not found.
OSError: Failed to resolve alflgh.com

On my network, where a host can't be resolved, the synchronous version returns an error immediately, whereas the async version pauses for a while.

Script:

import asyncio
import aiodns
import socket
import aiorepl

async def dns(site="micropython.org", port=443):
    try:
        print(socket.getaddrinfo(site, port))
    except OSError:
        print(f"Site {site} not found.")
    info = await aiodns.getaddrinfo(site, port)
    print(info)
    print()

async def main():
    repl = asyncio.create_task(aiorepl.task(globals()))
    dnstask = asyncio.create_task(dns())
    await asyncio.gather(repl, dnstask)

asyncio.run(main())

General note

On past experience I doubt an official version will emerge for a while. This was first discussed several years ago.