micropython / micropython-esp32

Old port of MicroPython to the ESP32 -- new port is at https://github.com/micropython/micropython
MIT License
673 stars 216 forks source link

Guru Meditation Error during socket.connect() #204

Open nickzoic opened 6 years ago

nickzoic commented 6 years ago

If you try to socket.connect() to an unreachable TCP/IP address it eventually (~15 seconds) returns with OSError: [Errno 113] EHOSTUNREACH

However, if you Ctrl-C during this time, the exception is immediately followed by a crash:

>>> s.connect(('10.107.1.6', 9999))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 113] EHOSTUNREACH
>>> NLR jump failed, val=0x3ffda814
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.

This occurs with network.WLAN() or the new network.LAN() adaptor.

nickzoic commented 6 years ago

(yes, I'll have a look at this but I'm adding it here so I don't forget)

MrSurly commented 6 years ago

Well, that explains recent problems I've had with WLAN.

Side Note, even machine.reset() gives Guru Meditation (Ah, my Amiga days), so there's something weird with the current IDF being used.

dpgeorge commented 6 years ago

It looks like the ctrl-C that you press during connect() is being buffered. Then the EHOSTUNREACH exception is being raised, but the ctrl-C is still pending. The ctrl-C is then raised in some strange location which leads to the crash.

Apart from this being a bug (which may be difficult to track down the reason for), to fix connect() so that you can do ctrl-C to break out of it would require setting the socket to be non-blocking at the start, then do a loop polling for the connect() to complete. In that loop you can check for ctrl-C explicitly (by calling mp_handle_pending()).

Note: this stuff is already handled in esp8266 because it uses extmod/modlwip.c which wraps the lwIP stack at a lower level. And I don't think it's possible to hook into the esp32 lwIP stack at such a level, because it's probably not exposed and also there are multi-core issues to consider.

nickzoic commented 6 years ago

Same behaviour in v1.9.2-279-g090b6b80 Similar in v1.9.2-225-g75ead22c (no "Guru" message, but same "NLR jump failed")

@dpgeorge yeah, I was thinking that, we do similar things elsewhere in that library to "fake" timeouts.

MrSurly commented 6 years ago

Apart from this being a bug (which may be difficult to track down the reason for), to fix connect() so that you can do ctrl-C to break out of it would require setting the socket to be non-blocking at the start, then do a loop polling for the connect() to complete. In that loop you can check for ctrl-C explicitly (by calling mp_handle_pending()).

All of the other socket stuff implements blocking/timeout with a loop, for this same reason. I think connect doesn't do this, because it doesn't seem LWIP allows you to set the connect timeout.

In <IDF>/components/lwip/include/lwip/lwip/sockets.h

#define SO_CONTIMEO    0x1009 /* Unimplemented: connect timeout */

... and it's not implemented in the API, either. =(

nickzoic commented 6 years ago

Looks like we should be able to do something along these lines: https://github.com/dreamcat4/lwip/blob/master/contrib/apps/socket_examples/socket_examples.c

I'll try and get something in for this ASAP.