thliebig / openEMS

openEMS is a free and open-source electromagnetic field solver using the EC-FDTD method.
http://openEMS.de
GNU General Public License v3.0
453 stars 156 forks source link

Handle SIGINT for openEMS and Python, with graceful exit support. #115

Closed biergaizi closed 1 year ago

biergaizi commented 1 year ago

Note: This is the first draft for comments and reviews, do not merge. Update: It's now merge-ready, support for Windows's native ConsoleCtrlHandler has also been added.

Currently, openEMS doesn't have any special code to handle SIGINT (which is raised by pressing Control-C). By default, the program is terminated without saving data. This worked okay in the past, but now its limitations are becoming obvious.

  1. When openEMS is used as a Python module, Control-C stops working because SIGINT is now managed by Python in order to generate KeyboardInterrupt exceptions, normally this isn't a problem, but if we are running an external C++ (Cython) function such as openEMS, the Python interpreter mainloop has no control until we return. As a result, SIGINT is received but never handled. In Cython, programs are expected to call PyErr_CheckSignals() in its blocking loop periodically to temporally transfer control back to Python to handle signals. But this introduces a dependency of Cython in the FDTD mainloop.

  2. During a simulation, it's not possible to abort it gracefully by pressing Control-C, this is a limitation of openEMS itself, it's always a force exit. Currently the only supported method for graceful exit is creating a file called "ABORT" in the simulation directory. If we already need to implement a signal handler, adding a graceful exit at the same time would be a good idea.

This commit installs SIGINT handlers during SetupFDTD() and RunFDTD().

  1. In RunFDTD(), if SIGINT is received once, a status flag is set, which is then checked in CheckAbortCond(), allowing a graceful exit with the same effect of an "ABORT" file. If SIGINT is received twice, openEMS force exit without saving data (just like the old default behavior).

  2. In SetupFDTD(), if SIGINT is received, openEMS immediately force exit without saving data, identical to the old behavior. In a huge simulation, initializing and compressing operators may have a long time. so we want an early exit before RunFDTD().

  3. Before RunFDTD() and SetupFDTD() return, the original signal handler for SIGINT is restored. This is important since when we're acting as a shared library. When a program (such as the Python interpreter) calls us, changing the SIGINT handler unilaterally may overwrite the original handler and affect the functionality of the original program. For example, Python would never be able to raise KeyboardInterrupt again. Thus, we save the original handler and restore it later.

Demo

Interrupting SetupFDTD

 ---------------------------------------------------------------------- 
 | openEMS 64bit -- version dummy-1-g93bfcfc
 | (C) 2010-2023 Thorsten Liebig <thorsten.liebig@gmx.de>  GPL license
 ---------------------------------------------------------------------- 
    Used external libraries:
        CSXCAD -- Version: v0.6.2-123-gc29742b
        hdf5   -- Version: 1.12.1
                  compiled against: HDF5 library version: 1.12.1
        tinyxml -- compiled against: 2.6.2
        fparser
        boost  -- compiled against: 1_76
        vtk -- Version: 9.1.0
               compiled against: 9.1.0

Create FDTD operator (compressed SSE + multi-threading)
^C
Signal::UnixForceExitHandler(): Force-exit simulation process now!

Interrupting RunFDTD, graceful exit

Create FDTD engine (compressed SSE + multi-threading)
Running FDTD engine... this may take a while... grab a cup of coffee?!?
[@        7s] Timestep:          188 || Speed:   68.3 MC/s (4.035e-02 s/TS) || Energy: ~9.90e-21 (- 0.00dB)
^C
Signal::UnixGracefulExitHandler(): Gracefully aborting simulation now, this may take a few seconds...
Signal::UnixGracefulExitHandler(): To force-exit, send Ctrl-C again, but simulation results may be lost.
openEMS::CheckAbortCond(): Received SIGINT, aborting simulation gracefully...
RunFDTD: Warning: Max. number of timesteps was reached before the end-criteria of -50dB was reached... 
    You may want to choose a higher number of max. timesteps... 
Time for 1296 iterations with 463320.00 cells : 4.93 sec
Speed: 121.74 MCells/s

Interrupting RunFDTD, force exit

Create FDTD engine (compressed SSE + multi-threading)
Running FDTD engine... this may take a while... grab a cup of coffee?!?
[@        7s] Timestep:          188 || Speed:   68.0 MC/s (4.052e-02 s/TS) || Energy: ~9.90e-21 (- 0.00dB)
^C
Signal::UnixGracefulExitHandler(): Gracefully aborting simulation now, this may take a few seconds...
Signal::UnixGracefulExitHandler(): To force-exit, send Ctrl-C again, but simulation results may be lost.
^C
Signal::UnixForceExitHandler(): Force-exit simulation process now!
thliebig commented 1 year ago

I had a quick look at the code but no comments from my side, except just: I like it. I really need to get v0.0.36 released so that we can start merging all this nice improvements...

biergaizi commented 1 year ago

Update: I just wrote and force-pushed another revision, and now I think it's ready for merge (at least after the v0.0.36 release). I also added support for Windows's native ConsoleCtrlHandler, tested on MinGW but should also work on MSVC. So it's now feature-complete.

biergaizi commented 1 year ago

Also, please consider merging https://github.com/thliebig/fparser/pull/5 and https://github.com/thliebig/openEMS/pull/116 before releasing v0.0.36. I'm currently experience both problems during my openEMS performance test project.