Open judith-ipac opened 2 years ago
FWIW, when I run "make check" in the slurm-drmaa-1.1.3 repo, it stalls after the first test suite:
============================================================================
make[4]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils/test' make[3]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils/test' make[2]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils/test' make[2]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils' make[2]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils' make[1]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils' Making check in slurm_drmaa make[1]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/slurm_drmaa' make[1]: Nothing to be done for 'check'. make[1]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/slurm_drmaa' Making check in test make[1]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test' make slurm_ping make[2]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test' make[2]: 'slurm_ping' is up to date. make[2]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/test' make check-TESTS make[2]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test' make[3]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test'
Thanks.
Hello and apologies if this question is in the wrong place. We are upgrading from Debian 8 to Debian 11. I am a developer with no particular background in system administration or configuration. Several weeks into a cycle of install/google-error-message/install-something-else, I have installed munge, slurm, slurm-drmaa, and bats(!). slurmctld and slurmd are now running, but calls to drmaa_run_job() result in seg faults. (The surrounding C++ code is copied from our Debian 8 host, where drmaa_run_job() runs successfully.) I'll print some debug output below, but what I'm really looking for is start-to-finish step-by-step instructions for configuring, installing, and running whatever it takes to make SLURM usable on Debian 11. Thanks in advance.
Last few steps of debug output from drmaa_run_job:
d #597f9 [ 40.42] finalizing job constraints d #597f9 [ 40.42] set min_cpus to ntasks: 1 t #597f9 [ 40.42] <- slurmdrmaa_parse_native ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1] [si_int=0] [si_ptr=(nil)] [si_addr=0x1656] kpedbg_dmp_stack()+394<-kpeDbgCrash()+204<-kpeDbgSignalHandler()+113<-skgesig_sigactionHandler()+258<-sighandler()<-0x00007F06CFEC9B71<-slurm_pack_selected_step()+1286<-slurm_send_node_msg()+505<-slurm_send_recv_msg()+66<-slurm_send_recv_controller_msg()+315<-slurm_submit_batch_job()+119<-slurmdrmaa_session_run_bulk()+518<-slurmdrmaa_session_run_job()+179<-drmaa_run_job()+374<-_ZN19custom_code::submit_jobERKN5boost10filesystem4pathES4_RKNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_bb()+4407<-0x0000000000000009<-0x7453705F6D00626F
runscript.sh: line 62: 366577 Segmentation fault
Stack trace from gdb:
Any advice would be greatly appreciated.