Open paulromano opened 3 years ago
This is something I'd like to help take on because error handling on GPU code is currently impossible. My current thinking is that we should add some new global variable like error_state
that is an enum of possible error codes (or maybe each thread should own a copy?). The python API functions could work as-is and return that code. Then if running main() from C++, you'd just return back to main() and add some function for printing helpful error messages before exiting or possibly returning that error code.
If we took this approach on GPU, it would be nice, because this error flag could be set by some thread on the device, then the appropriate handling be done on the CPU after the kernel is done.
I think that approach sounds pretty nice. One of the nice things about storing the state in a global as opposed to returning the value is that you don't need to pass the return value through nested function calls. Exceptions also solve that problem, but of course they are no-go for GPUs. Note that we already have a global openmc_err_msg
variable that is used by openmc.lib
to get the error message.
For simplicity, we might want to just have the error state be an int
rather than an enum
so that it is clearly visible through a C API.
One of OpenMC's "bad" behaviors right now is that when it encounters an error, it most circumstances it will call
std::exit
(orMPI_Abort
when using MPI). This is not a desirable behavior if OpenMC is used as a library (e.g., as part of a multiphysics application). It also sucks when you're using theopenmc.lib
Python bindings to the C/C++ API and an error takes down the whole executable rather than bubbling up as an exception. We should be following community guidelines with respect to error handling. For example, one of the recommendations in the xSDK community package policies (R3) is:What we have is currently a bit of a mess:
std::exit
orMPI_Abort
openmc.lib
Python bindings)