jczic / MicroWebSrv2

The last Micro Web Server for IoTs (MicroPython) or large servers (CPython), that supports WebSockets, routes, template engine and with really optimized architecture (mem allocations, async I/Os). Ready for ESP32, STM32 on Pyboard, Pycom's chipsets (WiPy, LoPy, ...). Robust, efficient and documented!
https://github.com/jczic/MicroWebSrv2
MIT License
653 stars 96 forks source link

Restarting webserver after soft reboot #8

Closed amotl closed 4 years ago

amotl commented 4 years ago

Dear Jean-Christophe,

we are very happy about the new release of your fine webserver library. For now, it looks like it works very stable on one of our FiPy devices sitting on our workbench.

We are in the process of porting the Terkin Datalogger [1] to MicroWebSrv2 and experienced this issue after exiting the program using CTRL+C and invoking a soft reboot using CTRL+D.

It looks like the listener does not go away and so, MicroWebSrv2 can do nothing about its IsRunning method suitable for detecting this. The Webserver isn't actually running but it looks like XAsyncSocket does not give up it's listening socket on program termination, so it is obvious it can't bind again to the same port after coming back from the soft reboot.

It currently looks like that for us:

PYB: soft reboot

[...]

   15.6195 [terkin.network.core      ] INFO   : Setting up HTTP API
   15.6389 [terkin.api.http          ] INFO   : Starting HTTP server
MWS2-INFO> Stops the managed pool.
   15.6814 [terkin.network.core      ] ERROR  : Starting HTTP server failed
Traceback (most recent call last):
  File "network/core.py", line 126, in start_services
  File "network/core.py", line 146, in start_httpserver
  File "api/http.py", line 70, in start
  File "MicroWebSrv2/microWebSrv2.py", line 220, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 215, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 215, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 207, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 185, in StartInPool
MicroWebSrv2Exception: Cannot bind server on 0.0.0.0:80.

I believe we had a global variable on the module level to keep a reference to the MicroWebSrv reference, but this eventually stopped working after upgrading to a more recent version of Pycom MicroPython. We might also be wrong on that so MicroWebSrv might have been more graceful to us.

We figure it would be the best thing to invoke MicroWebSrv2 before shutting down, i.e. when quitting the program through CTRL+C or just before the "soft reboot".

So, we are humbly asking whether you would see a way to get around this issue?

With kind regards, Andreas.

[1] https://github.com/hiveeyes/hiveeyes-micropython-firmware

jczic commented 4 years ago

Ok, I'll check that very as soon as ! :)

jczic commented 4 years ago

Ok, I found the CTRL+C problem and fixed it in the XAsyncSockets library (just to support the KeyboardInterrupt exception with grace).

But as far as the soft reset (with CTRL+D) is concerned, the error you have is raised from XAsyncTCPServer.Create() when a simple TCP socket is created, before a listening call. The TCP port is really closed, everything is released and I looked around for this error, it is apparently a firmware bug in MicroPython that raise "Network card not available" (from the socket module).

Do you have any ideas on your side? I will continue to look into this again.

jczic commented 4 years ago

I found a way to get around this problem : https://github.com/jczic/MicroWebSrv2/commit/a220069d045061a0a21b947fcd3cd1f771046112 Just with an Init/Deinit of the WLAN object and soft reboot works perfectly.

jczic commented 4 years ago

Thank you very much Andreas 👍

jczic commented 4 years ago

@amotl : Is this method right for you?

amotl commented 4 years ago

Dear Jean-Christophe,

thanks for getting back on this detail.

I found a way to get around this problem: a220069 Just with an Init/Deinit of the WLAN object and soft reboot works perfectly.

I am not sure about this yet. I really wouldn't want to .deinit() WiFi while connected through the network. Background: Pycom devices offer the possibility to use telnet to connect to the REPL of the device.

It is apparently a firmware bug in MicroPython that raise "Network card not available" (from the socket module).

I know this guy. This is probably a different thing you are observing there. May I ask on which device you are seeing this?

Let's think about the core issue again: What about if XAsyncSockets would just invoke server.Stop() / i.e. _close()ing all sockets when catching the KeyboardInterrupt?

As some people might want to keep the webserver running while exiting the main program (I believe it would do so at is running in a separate thread, right?), this detail could eventually be made configurable through a shutdown_on_exit setting somewhere.

Essentially, I am looking for something that would resemble the atexit feature of vanilla Python in order to implement the gracefulness in a similar way you have outlined through a220069.

With kind regards, Andreas.

amotl commented 4 years ago

Let's just hold on: I will try to wrap our main() method of the Terkin Datalogger [1] within a try/except statement catching the KeyboardInterrupt and will try to just invoke mws2.Stop() there.

If everything will get cleaned up properly, then we would be all good with this.

[1] https://github.com/hiveeyes/hiveeyes-micropython-firmware

jczic commented 4 years ago

Thank you for your feedback Andreas. I can only say that the socket is correctly closed and that the background error does not come from the bind/listen but comes from the creation of any socket in fact (with the exception'Network card not available') after a soft reset. It is indeed strange and I do not understand any more at the moment.

sfewings commented 4 years ago

Hi Jean-Christophe, I also get the error MicroWebSrv2Exception: Cannot bind server on 0.0.0.0:80.

The wlan.init(), wlan.deinit() are not available for my firmware; Pyboard Series D SF6, latest firmware built for _thread.

I'm not certain it is an open socket between soft reboots as I also get this error on a hard reset.

jczic commented 4 years ago

Exactly @sfewings, that's works now (fix main.py example to run without WLAN).

jczic commented 4 years ago

@amotl, I looked around the problem but I couldn't find a direct solution. The sockets are well closed and socket.socket(...) continues to raise the exception. But I found that without disinitializing WLAN but just by initializing it with WLAN().init(), it works strangely and the wlan is not cut off as a result. I've commited the master branch.

amotl commented 4 years ago

But I found that without shutting down WLAN at all but just by reinitializing it with WLAN().init(), it works.

Yeah, the MicroPython implementation will probably reset the network stack then. While it might not exactly be the thing we are looking at here, [1] ff. gives some perspective about the nitty gritty details under the hood.

I will try to wrap our main() method of the Terkin Datalogger [1] within a try/except statement catching the KeyboardInterrupt and will try to just invoke mws2.Stop() there. If everything will get cleaned up properly, then we would be all good with this.

Unfortunately, I haven't been able to catch some time for testing this. Feel free to close this issue as you like.

[1] https://github.com/micropython/micropython/issues/4590

jczic commented 4 years ago

I found that in https://github.com/pycom/pycom-micropython-sigfox/blob/eca3b714a11c4b84e740a36b4096a20aac3c9be2/esp32/mods/modnetwork.c#L104 and I seen that is impossible to create a socket after a soft reset on a Pycom WiPy.

mp_obj_t mod_network_find_nic(const uint8_t *ip) {
    // find a NIC that is suited to given IP address
    for (mp_uint_t i = 0; i < MP_STATE_PORT(mod_network_nic_list).len; i++) {
        mp_obj_t nic = MP_STATE_PORT(mod_network_nic_list).items[i];
        // we want a raw network card
        if (ip == NULL) {
        #ifdef LOPY
            if (mp_obj_get_type(nic) == (mp_obj_type_t *)&mod_network_nic_type_lora) {
                return nic;
            }
        #endif
        #ifdef SIPY
            if (mp_obj_get_type(nic) == (mp_obj_type_t *)&mod_network_nic_type_sigfox) {
                return nic;
            }
        #endif
        } else {
            // TODO check IP suitability here
            return nic;
        }
    }
    nlr_raise(mp_obj_new_exception_msg(&mp_type_OSError, "Network card not available"));
}
amotl commented 4 years ago

Dear Jean-Christophe,

interesting finding, thanks. If that would be more related to the Network card not available issue you have been talking about within this conversation, we might well divert that into a different issue/discussion.

With kind regards, Andreas.

amotl commented 4 years ago

Hi again,

while I don't know if the implementation changed since then and further have no idea if this is of any relevance here, this comment by @dpgeorge was an interesting read I wanted to share with you

The way socket events are currently implemented, events (like connect, or receive data) are only checked for during "idle" periods of the script. A delay of 20ms will include an idle check. To check for events more regularly would either decrease performance of scripts (eg by checking through a timer callback or every N iterations of the VM loop) or increase RAM usage (by creating a dedicated thread to check for events). [1]

With kind regards, Andreas.

[1] https://github.com/micropython/micropython/commit/23e9c3b#commitcomment-29692889

jczic commented 4 years ago

Yes, this events checking is interesting but if you wait after a CTRL+C and after a CTRL+D reboot, the exception always exists and it's not possible to create a socket. That's works after a WLAN reinit (so, a reinit of network interface). Maybe we should report this bug (maybe even to Expressif...)

(Oh, I released a new 2.0.2 version for an another bug on some platforms).

sfewings commented 4 years ago

Thank you, My error is unrelated to the problem you detail above. It is far more simple. The firmware for PYBD_SF6 (Pyboard D series SF6) does not have socket.fileno() implemented. XAsyncSocket use of fileno to track sockets in the XAsyncSocketPool will not work without the fileno() identifier.
I am new to micropython and the Pyboards. It seems strange that something as simple as a socket.fileno() is not implemented across all ports!

jczic commented 4 years ago

Oh yes, unfortunately fileno() is indeed required by XAsyncSocket lib... Thank you for message but it's a shame for now. :'(

amotl commented 4 years ago

Hi there,

thank you all. We might really close this issue and divert the different interleaving topics into dedicated issues in order to track them seperately.

If I am coming back to this topic I started originally, I will be happy to reopen or create a different issue.

Thanks Jean-Christophe for all your hard work on this module!

Cheers, Andreas.

jczic commented 4 years ago

Thanks to you too Andreas and look forward with pleasure for next issues 👍 :')

amotl commented 4 years ago

Dear Jean-Christophe,

even without upgrading to the most recent version of MicroWebSrv2, catching the KeyboardInterrupt in order to shut down the HTTP server worked perfectly for us [1].

We cycled through some soft reboots a few times and it worked well [2]. FYI: When having several threads running, one has to hit CTRL-C multiple times in order to shut down each single thread before coming to the main thread.

Thanks again and with kind regards, Andreas.

[1] https://github.com/hiveeyes/hiveeyes-micropython-firmware/commit/e5b91b4e [2] While doing that, we found the FTP server within the Pycom firmwares stopped working at some time, but that is probably a totally different story.

jczic commented 4 years ago

Thanks again Andreas, I released version 2.0.3 which stops cleanly, with all threads, in case of KeyboardInterrupt :) 165d96bb747b5b9c2202ef0b9456daf6f7940ae6

amotl commented 4 years ago

Thanks Jean-Christophe,

when using the new code, after stopping through catching KeyboardInterrupt in the main thread and invoking .Stop() on the server like as done through https://github.com/hiveeyes/hiveeyes-micropython-firmware/commit/e5b91b4e, I am getting this, so the KeyboardInterrupt-handling within MicroWebSrv2 seems to work fine so far [1]

Unhandled exception in thread started by <bound_method>
Traceback (most recent call last):
  File "network/wifi.py", line 202, in stay_connected
KeyboardInterrupt:
   39.0365 [terkin.network.core         ] INFO   : Shutting down HTTP server
Pycom MicroPython 1.20.1.r1-0.6.0-annapurna-48412282 [87211d51-dirty] on 2019-11-15; FiPy with ESP32

However, I am receiving this when starting the server again after a soft reboot:

   18.7164 [terkin.network.core         ] INFO   : Setting up HTTP API
   18.7365 [terkin.api.http             ] INFO   : Creating new HTTP server object
   18.7673 [terkin.network.core         ] ERROR  : Starting HTTP server failed
Traceback (most recent call last):
  File "network/core.py", line 127, in start_services
  File "network/core.py", line 147, in start_httpserver
  File "api/http.py", line 66, in start
  File "MicroWebSrv2/microWebSrv2.py", line 378, in IsRunning
AttributeError: 'NoneType' object has no attribute 'WaitEventsProcessing'

I am wondering if it is a good idea to stop only parts of the machinery as some other parts might still be initialized even after the soft reboot. Maybe we can make this bit somehow configurable and stop everything from within MicroWebSrv2 when receiving the KeyboardInterrupt?

Saying this, we might also want to revert this. With the version before, everything works cleanly as expected while having to hit CTRL-C a few times.

However, waitress and possibly other servers handle that thing as well [2] in order to make it convenient for "interactive usage". So, we might well want to follow that route in order to get everything straight.

Sorry to bother you about this, but I recognize you are eager to solve this as well to make usage of your module as convenient as possible. While I can't put myself into that hands-on, I appreciate your work and will assist as good as possible.

Keep up the spirit!

With kind regards, Andreas.

[1] Please forget about the message File "network/wifi.py", line 202, in stay_connected. This is coming from another thread we are running. [2] https://github.com/Pylons/waitress/search?q=keyboardinterrupt&unscoped_q=keyboardinterrupt

amotl commented 4 years ago

other servers handle that thing as well [1] in order to make it convenient for "interactive usage"

to make usage of your module as convenient as possible

When looking at the code of waitress, you will see it is also invoking self.close() from within handling the KeyboardInterrupt, probably to close any listening sockets. So, besides stopping the dispatcher thread, we might want to do this as well with MicroWebSrv2?

amotl commented 4 years ago

Regarding the

AttributeError: 'NoneType' object has no attribute 'WaitEventsProcessing'

I just swapped the parts of the condition check

@property
def IsRunning(self) :
    return \
        self._xasSrv is not None and \
        self._xasPool.WaitEventsProcessing

and it looks it will make things better, even across soft reboots.

amotl commented 4 years ago

I've just hit another case within _processWaitEvents where the KeyboardInterrupt is not handled:

Unhandled exception in thread started by <bound_method>
Traceback (most recent call last):
  File "network/wifi.py", line 203, in stay_connected
  File "MicroWebSrv2/libs/XAsyncSockets.py", line 115, in _processWaitEvents
KeyboardInterrupt:

As far as I know from regular Python, catching the KeyboardInterrupt has to be performed explicitly within a try/except clause where this might be about to happen. Maybe we should also refactor these nested exception handlers within there once more to make the flow a little more cleaner to behave?

jczic commented 4 years ago

Exactly, in IsRunning property, it must check the None value of _xasPool, I've added that : 7be88c7b21c3ad5125d2acb47395bf39b740cc8a

I've also updated XAsyncSockets lib to re-raise the KeyboardInterrupt exception correctly : d8663f685b1c5a426dc9f179632312e743b7b67b

Is it good now? :)

amotl commented 4 years ago

I believe it works pretty well now. After adding the same things to our datalogger through https://github.com/hiveeyes/hiveeyes-micropython-firmware/commit/b879a4b0, it decently shuts down and bootstraps again fine after soft rebooting through hitting CTRL-D.

Thanks again!

amotl commented 4 years ago

A minor but also decent optimization for convenience could be a thing where one could start up the server with an option to perform the shutdown on its own, so we could free the main program from that routine.

Maybe enabled through calling srv.ShutdownOnQuit(True) before invoking srv.StartManaged().

jczic commented 4 years ago

Yes, but it is already the case. The loop in the main program is used to maintain execution. Otherwise, the program ends.

amotl commented 4 years ago

All right. I will check this with the next iteration. Thanks again!

amotl commented 4 years ago

Dear Jean-Christophe,

I don't see .Stop() being invoked from within MicroWebSrv2. So, when not using https://github.com/hiveeyes/hiveeyes-micropython-firmware/commit/e5b91b4e, we are getting this exception when trying to start the server after a soft reboot:

   18.5368 [terkin.network.core         ] ERROR  : Starting HTTP server failed
Traceback (most recent call last):
  File "network/core.py", line 127, in start_services
  File "network/core.py", line 147, in start_httpserver
  File "api/http.py", line 71, in start
  File "MicroWebSrv2/microWebSrv2.py", line 226, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 221, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 221, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 213, in StartManaged
  File "MicroWebSrv2/microWebSrv2.py", line 191, in StartInPool
MicroWebSrv2Exception: Cannot bind server on 0.0.0.0:80.

So, handling the KeyboardInterrupt on the top level and invoking .Stop() should be done in any case.

The background on this is that the SIGINT signal coming in through CTRL-C seems to be handled by an arbitrary thread so it is not sure it will arrive either at one of the threads MicroWebSrv2 is spawning nor at the main thread, if more threads are involved.

However, this is just an assumption and would contrast the way it is implemented within regular Python:

Python signal handlers are always executed in the main Python thread, even if the signal was received in another thread.

-- https://docs.python.org/3/library/signal.html#signals-and-threads

We are currently working on these details as well as we have different threads running within the Terkin Datalogger.

With kind regards, Andreas.

amotl commented 4 years ago

The background on this is that the SIGINT signal coming in through CTRL-C seems to be handled by an arbitrary thread so it is not sure it will arrive either at one of the threads MicroWebSrv2 is spawning nor at the main thread, if more threads are involved.

Just for a reference, here are two issues on the MicroPython issue tracker about SIGINTs:

However, they are originating from the Unix port of MicroPython and don't touch the aspects threading at all, as far as I can see.

jczic commented 4 years ago

Hello Andreas, Just, could you try to create a socket (just create) after a soft reboot? Without MicroWebSrv2? (the Exception "Cannot bind server on 0.0.0.0:80." is not right, this is always the same problem, not at binding or listen in XAsyncSockets).

amotl commented 4 years ago

Hi Jean-Christophe,

when doing it that way, the error is obvious:

Traceback (most recent call last):
  File "main.py", line 9, in <module>
OSError: [Errno 112] EADDRINUSE

The code is

import mininet
net = mininet.MiniNet()
net.connect_wifi_sta('SSID', 'redacted')

import socket
srvSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
srvSocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
srvSocket.bind(('0.0.0.0', 80))
srvSocket.listen(8)

I am totally fine with the current implementation regarding catching the KeyboardInterrupt. I believe we can't get any better. Working on this, I am happy to having improved similar things through https://github.com/hiveeyes/hiveeyes-micropython-firmware/commit/7ce9947f.

Thanks again!

With kind regards, Andreas.

jczic commented 4 years ago

:) Ok for your code but you initialize wifi (with connect) and you doesn't use the reuse addr option. So, it's normal that you have a "[Errno 112] EADDRINUSE". But if you just try that :

Do you have the "OSError: Network card not available"?

amotl commented 4 years ago

You don't use the reuse addr option, so it's normal that you have a [Errno 112] EADDRINUSE.

I do:

srvSocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

But if you just enter import socket; s = socket.socket(), do you have the OSError: Network card not available?

Yes I do. You will have to get up any network stack first. I'm doing that by using the mininet module.

jczic commented 4 years ago

Yes, after a soft reset only 👍 Also, thank you for all Andreas!