pfalcon / pycopy

Pycopy - a minimalist and memory-efficient Python dialect. Good for desktop, cloud, constrained systems, microcontrollers, and just everything.
http://pycopy.readthedocs.io
MIT License
805 stars 78 forks source link

uselect Poll.ipoll() returns unexpected events with 'userdata' feature #46

Open AndyFug opened 4 years ago

AndyFug commented 4 years ago

Hi, Novice to this world, but hoping that this is a valid contribution. Apologies if I've missed something obvious or if this report isn't clear enough/in the correct place. Sorry that it's a little lengthy but I'm led to believe that more (useful) information is often better than less. Feedback accepted. :)

I was trying to use Picoweb/Uasyncio with Pycopy on an ESP8266, when I stumbled across the EAGAIN issues discussed here: https://github.com/pfalcon/pycopy/issues/30 https://github.com/micropython/micropython/issues/4824

I applied the work-around suggested by catching the exception on s.accept() in uasyncio start_server() loop, but that then unveiled what appeared to be going on (once you enabled verbose logging). It appeared to me that uselect Poll.ipoll() method was returning unexpected events, so when uasyncio's start_server() loop was expecting a POLLIN event after yielding IORead(s) it was, in fact, receiving other events - mostly POLLOUT(4), hence why it was expecting to accept new client connections when there weren't any and raising the EAGAIN exception.

To look into this further, I came up with a minimal test-case for uselect using the same methods as uasyncio, based on others' code from previous posts:

import usocket as socket, uselect as select, ulogging as logging, gc

_blocking = False
_userdata = True
_max_events = 1000
_start_mem = 0

log = logging.getLogger("test")

def issock(s1, s2): return s1 == s2

# Two options for register method.  With userdata and without.
def register(poller, s):
    if _userdata:
        poller.register(s, select.POLLIN, "Python object")
    else:
        poller.register(s, select.POLLIN)

def start_server(port):
    ai = socket.getaddrinfo('0.0.0.0', port)[0][-1]
    print("Starting server... Port: {}".format(port))
    print("Bug in place: {}".format(_userdata))
    ss = socket.socket()
    ss.bind(ai)
    ss.listen(10)
    ss.setblocking(_blocking)
    ss.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

    poll = select.poll()
    register(poll, ss)

    i = 0

    gc.collect()
    start_mem = gc.mem_free()

    while True:
        gc.collect()
        log.debug("Memory used: {}".format(start_mem - gc.mem_free()))

        # One-shot behaviour
        res = poll.ipoll(-1, 1)
        for s, e, obj in res:
            i += 1
            log.debug("#{0} NEW SOCKET EVENT: Socket: {1}  Event: {2}  Userdata: {3}".format(i, s, e, obj))

            # New client connection on server socket
            if issock(s, ss) and e == select.POLLIN:
                s, addr = ss.accept()
                s.setblocking(_blocking)
                del addr
                # Imitate uasyncio... use register() not modify()
                register(poll, s)
                register(poll, ss)
                log.debug("New connection!")
                continue

            # Assume new client message
            if not issock(s, ss) and e == select.POLLIN:
                log.debug("Trying to read from socket...")
                msg = s.recv(1024)
                if msg:
                    print("*** NEW MESSAGE: {}".format(msg))
                    register(poll, s)

                # Empty msg > close connection
                if not msg:
                    log.debug("Closing socket...")
                    poll.unregister(s)
                    s.close()

            # If the event is not POLLIN, just log it.
            else:
                log.debug("Unhandled event: {} on socket: {}".format(e, s))
                # Re-register it for POLLIN becuase otherwise we won't get any future events (single-shot mode)
                register(poll, s)

        # Stop the server after x events
        if i >= _max_events:
            print("Closing down server...")
            poll.unregister(ss)
            ss.close()
            break

def run(userdata=True, debug=True, port=8000):
    global _userdata
    _userdata = userdata

    if debug is True:
        log.setLevel(logging.DEBUG)
    else:
        log.setLevel(logging.INFO)

    start_server(port)

After playing around with this code on the ESP8266, I discovered that the issue described only appeared to come about when adding the 'userdata' argument to the Poll.register() method. The code above can be run with/without 'userdata' and, for me, it only works as expected when the 'userdata' feature isn't used. I used a basic packet sender application as a client to send short ASCII strings. Here are the outputs:

With Poll.register() 'userdata' argument: (No client connections during this output... just the server running on its own)

Pycopy v1.11-1462-gaf0948daa on 2020-06-14; ESP module with ESP8266
Type "help()" for more information.
>>> uatest1.run(userdata=True)
Starting server... Port: 8000
Bug in place: True
DEBUG:test:Memory used: 0
DEBUG:test:#1 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 32
DEBUG:test:#2 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 32
DEBUG:test:#3 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 32
DEBUG:test:#4 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 32

Without Poll.register() 'userdata' argument: (+ Client connection with 3 ASCII strings sent client > server)

Pycopy v1.11-1462-gaf0948daa on 2020-06-14; ESP module with ESP8266
Type "help()" for more information.
>>> uatest1.run(userdata=False)
Starting server... Port: 8000
Bug in place: False
DEBUG:test:Memory used: 0
DEBUG:test:#1 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=1000a off=0>  Event: 1  Userdata: None
DEBUG:test:New connection!
DEBUG:test:Memory used: 96
DEBUG:test:#2 NEW SOCKET EVENT: Socket: <socket state=3 timeout=0 incoming=3fff9110 off=0>  Event: 1  Userdata: None
DEBUG:test:Trying to read from socket...
*** NEW MESSAGE: b'Testing - Message 1'
DEBUG:test:Memory used: 144
DEBUG:test:#3 NEW SOCKET EVENT: Socket: <socket state=3 timeout=0 incoming=3fff9110 off=0>  Event: 1  Userdata: None
DEBUG:test:Trying to read from socket...
*** NEW MESSAGE: b'Message 2\r'
DEBUG:test:Memory used: 128
DEBUG:test:#4 NEW SOCKET EVENT: Socket: <socket state=3 timeout=0 incoming=3fff9110 off=0>  Event: 1  Userdata: None
DEBUG:test:Trying to read from socket...
*** NEW MESSAGE: b'Message 3 - Closing after this.\r'
DEBUG:test:Memory used: 160
DEBUG:test:#5 NEW SOCKET EVENT: Socket: <socket state=4 timeout=0 incoming=0 off=0>  Event: 1  Userdata: None
DEBUG:test:Trying to read from socket...
DEBUG:test:Closing socket...
DEBUG:test:Memory used: 96

Extra info:

For my Picoweb app (which was leaking memory after consecutive client connections), I've reverted uasyncio's application of userdata for callbacks by rolling back commit: https://github.com/pfalcon/pycopy-lib/commit/69fecaf. This seems to have fixed my memory leak issue for now.

Sadly I'm not experienced enough to find the root cause of the issue myself. I suspect it could lie in a C module (Such as this one?) or lwIP stack, but I don't have the experience to follow what's going on at this low level.

Many thanks in advance to anyone who looks into this with me, and for all the fantastic work you guys have done on these projects.

Kind regards,

Andy

pfalcon commented 4 years ago

Sorry that it's a little lengthy but I'm led to believe that more (useful) information is often better than less.

@AndyFug, I would like to thank you for the excellent bug report. It indeed covers all the major points - Unix port tested, detailed description, minimal reproduction testcase provided, and even the problem spot is pinpointed. I understand that it took some time to prepare, and even just write down, this report, and it's fully appreciated. That's the kind of report any open-source maintainer would like to receive (well, I for sure).

I have to admit that lately I'm concentrating on Unix port, simply because IMHO no MicroPython microcontroller port is without issues, and my approach (unlike MicroPython) is that the Unix port is the main one, and other ports should follow it, not the other way around. Actually, I'm not even sure if I have an esp8266 hardware here (== where world's lockdown caught me) for testing. So well, I apologize if processing this will take more time. I hope to look into this in more detail on weekend.

AndyFug commented 4 years ago

Hi @pfalcon, and many thanks for your response.

Glad that my first report on GitHub has covered the main points... I now need to start learning a bit more about git/github etc and attempt to apply it to some of my projects.

Totally understand re. Unix port being your main priority. I thought you may say that. I'm keen to use Picoweb (and, in turn, Pycopy etc) so I'll just keep my reverted version of uasyncio running on my current project for now until you get an opportunity to look into this.

Cheers, Andy

pfalcon commented 4 years ago

@AndyFug, So, I had a quick look so far. The commit which introduces userdata support is 3ae5c19d2a50449391642f3dac788ce1c7d83ee0. As you can see, it's marked "WIP", but that's mostly, IIRC, because it didn't have an associated unittest (which is kinda hard to write so it covered every possibility). The changes in that commit are however pretty simple and obvious. It doesn't change any existing algorithms, just add storage for this "userdata", then marshals it back and forth.

So, while your testcase demonstrates that the problem is triggered by "userdata" param, I suspect that's a false lead, and there's a different underlying problem. One issue can be that baremetal (unlike Unix) moduselect.c uses mp_map_t to store info about the polled sockets, which is not safe to change (e.g., add, possible, delete), during iteration.

And your testcase demonstrates that, in the for loop over poll.ipoll(-1, 1), there're calls to

                register(poll, s)
                register(poll, ss)

etc.

So, to verify if that would be the problem, you would need to replace:

        res = poll.ipoll(-1, 1)
        for s, e, obj in res:

with

        res = list(poll.ipoll(-1, 1))
        for s, e, obj in res:

I.e. wrap ipoll() call in a list(), to make it eager (produce all results in once) instead of lazy (incremental).

If you can test such a change, I'd appreciate it, because I still have a bunch of WIP issues in my queue, before I can start to look into this one.

AndyFug commented 4 years ago

Hi, Thanks for looking into this again. I made the change as you described to the test case. Relevant code here:

    while True:
        gc.collect()
        log.debug("Memory used: {}".format(start_mem - gc.mem_free()))

        # One-shot behaviour + eager instead of lazy iteration by wrapping result in a list
        res = list(poll.ipoll(-1, 1))
        log.debug("ipoll result: {}".format(res))
        for s, e, obj in res:
            i += 1

Sadly the output is still the same. As soon as I start the server loop with 'userdata' parameter, I get the following output (with no client connections):

>>> import ipoll_test as it
>>> it.run()
Starting server... Port: 8000
Bug in place: True
DEBUG:test:Memory used: 0
DEBUG:test:ipoll result: [(<socket state=1 timeout=0 incoming=a off=0>, 4, 'Python object')]
DEBUG:test:#1 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 64
DEBUG:test:ipoll result: [(<socket state=1 timeout=0 incoming=a off=0>, 4, 'Python object')]
DEBUG:test:#2 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 64
DEBUG:test:ipoll result: [(<socket state=1 timeout=0 incoming=a off=0>, 4, 'Python object')]
DEBUG:test:#3 NEW SOCKET EVENT: Socket: <socket state=1 timeout=0 incoming=a off=0>  Event: 4  Userdata: Python object
DEBUG:test:Unhandled event: 4 on socket: <socket state=1 timeout=0 incoming=a off=0>
DEBUG:test:Memory used: 64

For what it's worth (given my lack of experience/understanding at this low level), I'll take a look at that commit you mentioned ( https://github.com/pfalcon/pycopy/commit/3ae5c19d2a50449391642f3dac788ce1c7d83ee0 ) and see if I can spot anything else or think of another test that would highlight the underlying issue. I'll report back if I discover anything.. (unlikely!) :)

Many thanks again for taking the time to have a think about this. Let me know if you want me to conduct any further tests with the test case or experiment with tweaks to other modules.

Cheers, Andy