Essentially, if we set the inner workbench process to produce core dump files on crash, we could get the c++ thread traces out of the file using a tool like gdb. This could be sent to the error reporter site (which will need a bit of a change to support this).
Previously I used gdb but it came with some drawbacks, mainly that it was very slow. It was also just a bit heavy-handed to add gdb as a runtime dependency of workbench.
This time round, after discussing with @martyngigg, pystack looks like a good option. (https://bloomberg.github.io/pystack/corefile.html). This will produce much more output than currently we can send to the error reports site (it gets the trace from every thread), so this data will be compressed and sent to a new field that'll be created in the error reports database model.
This issue is to create the system which will, switch on, locate, and analyse the core dump files, having them ready to send to the error reports site once that work is done.
I started some exploratory work on this a while back on this branch https://github.com/mantidproject/mantid/compare/main...core_dump_analysis. I had some success with this but there were also some problems with the implementation.
Essentially, if we set the inner workbench process to produce core dump files on crash, we could get the c++ thread traces out of the file using a tool like
gdb
. This could be sent to the error reporter site (which will need a bit of a change to support this).Previously I used
gdb
but it came with some drawbacks, mainly that it was very slow. It was also just a bit heavy-handed to add gdb as a runtime dependency of workbench.This time round, after discussing with @martyngigg,
pystack
looks like a good option. (https://bloomberg.github.io/pystack/corefile.html). This will produce much more output than currently we can send to the error reports site (it gets the trace from every thread), so this data will be compressed and sent to a new field that'll be created in the error reports database model.This issue is to create the system which will, switch on, locate, and analyse the core dump files, having them ready to send to the error reports site once that work is done.