sslab-gatech / mosaic

MIT License
78 stars 12 forks source link

Crash in Release mode #3

Open seanses opened 3 years ago

seanses commented 3 years ago

I was experimenting with mosaic on twitter-small graph and found that both "generate_graph.py" and "run_mosaic.py" crash in release mode. Below is the error message from "./generate_graph.py --dataset twitter-small --binary --in-memory":

[SG-LOG] Partitioning time: 4.042466 [SG-LOG] Shutdown PartitionManager 0 [SG-ERR:threadMain:121] Fail to pin a thread: Invalid argument XXX [threadMain:122] error exit with 1 stack trace for /mnt/D/di/mosaic/build/Release-x86_64/tools/grc/grc-in-memory pid=7292 Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. /mnt/D/di/mosaic/src/tools/scripts/7292: No such file or directory. No thread selected No stack. grc-in-memory: /mnt/D/di/mosaic/src/lib/util/util.cc:25: void scalable_graphs::util::__die(int, const char*, int): Assertion `0' failed. Command terminated by signal 6

Attaching it to gdb revealed it was crashing in

5 0x00005555555915c9 in scalable_graphs::util::Runnable::threadMain (arg=0x5555557b5250) at /mnt/D/di/mosaic/src/lib/util/runnable.cc:122

122 die(1); (gdb) list 117 cpu_sett* cpuset = &runnable->cpuset; 118 if (CPU_COUNT(cpuset)) { 119 int rc = sched_setaffinity(0, sizeof(*cpuset), cpuset); 120 if (rc) { 121 sg_err("Fail to pin a thread: %s\n", strerror(errno)); 122 die(1); 123 } 124 } 125 runnable->run(); 126 }

Below is the error message from "./run_mosaic.py --dataset twitter-small --algorithm pagerank --max-iteration 20 --gdb":

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [SG-LOG] Running with 1334392 vertices, 94 tiles, 32-bit-index: true [SG-LOG] selective scheduling false [SG-LOG] Started edge processor 0. [SG-LOG] selective scheduling false [SG-LOG] Started edge processor 1. [SG-LOG] Started all edge processors. [SG-LOG] Intialization done

Program received signal SIGSEGV, Segmentation fault. scalable_graphs::core::getGlobalToOrigIDFileName[abi:cxx11](config_t const&) () at /mnt/D/di/mosaic/src/lib/core/util.cc:93 93 return getGlobalToOrigIDFileName(config.path_to_globals);

Interestingly they both finished without crashing in debug mode.

steffen-maass commented 3 years ago

Regarding the crash when executing generate_graph.py, the error message points to sched_setaffinity() returning an EINVAL return code. The internet suggests this might happen if some of the CPUs of your system have been disabled (see https://stackoverflow.com/questions/53881781/why-does-sched-setaffinity-work-on-one-system-fails-on-another). Enabling all CPUs is likely to fix that? The crash in run_mosaic.py is likely a result of the previous graph generation not completing correctly.