lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

Add robust signal captures to QUDA #1449

Open weinbe2 opened 3 months ago

weinbe2 commented 3 months ago

Per a suggestion from @paboyle , we should add support for proper signal capturing in QUDA. (BSD references are welcome :) )

Scidac5usqcd commented 3 months ago

https://github.com/paboyle/Grid/blob/develop/Grid/util/Init.cc

Has some of the sort of gunk you need for trapping SIGFPE, enabling FPE exceptions, SEGV, SIGBUS backtrace printed and on x86 a register dump.

It compiles everywhere (MacOS and Linux at least) From what I saw of PeTSC, they've done a bit more with the sigaction, and perhaps use a little library.

https://petsc.org/release/src/sys/error/signal.c.html


void * Grid_backtrace_buffer[_NBACKTRACE];

void Grid_sa_signal_handler(int sig,siginfo_t *si,void * ptr)
{
  fprintf(stderr,"Caught signal %d\n",si->si_signo);
  fprintf(stderr,"  mem address %llx\n",(unsigned long long)si->si_addr);
  fprintf(stderr,"         code %d\n",si->si_code);
  // Linux/Posix
#ifdef __linux__
  // And x86 64bit
#ifdef __x86_64__
  ucontext_t * uc= (ucontext_t *)ptr;
  struct sigcontext *sc = (struct sigcontext *)&uc->uc_mcontext;
  fprintf(stderr,"  instruction %llx\n",(unsigned long long)sc->rip);
#define REG(A)  printf("  %s %lx\n",#A,sc-> A);
  REG(rdi);
  REG(rsi);
  REG(rbp);
  REG(rbx);
  REG(rdx);
  REG(rax);
  REG(rcx);
  REG(rsp);
  REG(rip);

  REG(r8);
  REG(r9);
  REG(r10);
  REG(r11);
  REG(r12);
  REG(r13);
  REG(r14);
  REG(r15);
#endif
#endif
  fflush(stderr);
  BACKTRACEFP(stderr);
  fprintf(stderr,"Called backtrace\n");
  fflush(stdout);
  fflush(stderr);
  exit(0);
  return;
};

void Grid_exit_handler(void)
{
  BACKTRACEFP(stdout);
  fflush(stdout);
}
void Grid_debug_handler_init(void)
{
  struct sigaction sa;
  sigemptyset (&sa.sa_mask);
  sa.sa_sigaction= Grid_sa_signal_handler;
  sa.sa_flags    = SA_SIGINFO;
  sigaction(SIGSEGV,&sa,NULL);
  sigaction(SIGTRAP,&sa,NULL);
  sigaction(SIGBUS,&sa,NULL);
  sigaction(SIGUSR2,&sa,NULL);

  feenableexcept( FE_INVALID|FE_OVERFLOW|FE_DIVBYZERO);

  sigaction(SIGFPE,&sa,NULL);
  sigaction(SIGKILL,&sa,NULL);
  sigaction(SIGILL,&sa,NULL);

  atexit(Grid_exit_handler);
}
'''
Scidac5usqcd commented 3 months ago

Oops signed in on wrong account.... @paboyle

weinbe2 commented 3 months ago

Thanks @paboyle, I appreciate the references!

mathiaswagner commented 2 months ago

We already support https://github.com/bombela/backward-cpp, https://github.com/lattice/quda/blob/9963aec17fc87385fce4717bb74872151b786419/CMakeLists.txt#L252

Also @paboyle please don't suggest any Grid samples as this is GPL.