mantidproject / mantid

Main repository for Mantid code
https://www.mantidproject.org
GNU General Public License v3.0
211 stars 124 forks source link

Analyse core dump files to capture c++ stack traces #38405

Open jhaigh0 opened 1 week ago

jhaigh0 commented 1 week ago

I started some exploratory work on this a while back on this branch https://github.com/mantidproject/mantid/compare/main...core_dump_analysis. I had some success with this but there were also some problems with the implementation.

Essentially, if we set the inner workbench process to produce core dump files on crash, we could get the c++ thread traces out of the file using a tool like gdb. This could be sent to the error reporter site (which will need a bit of a change to support this).

Previously I used gdb but it came with some drawbacks, mainly that it was very slow. It was also just a bit heavy-handed to add gdb as a runtime dependency of workbench.

This time round, after discussing with @martyngigg, pystack looks like a good option. (https://bloomberg.github.io/pystack/corefile.html). This will produce much more output than currently we can send to the error reports site (it gets the trace from every thread), so this data will be compressed and sent to a new field that'll be created in the error reports database model.

This issue is to create the system which will, switch on, locate, and analyse the core dump files, having them ready to send to the error reports site once that work is done.