Open jessesung opened 5 years ago
Daemon runs as:
#/usr/sbin/iceccd -m `getconf _NPROCESSORS_ONLN` --no-remote -d -n <netname> -N <hostname>
With the above commit production build should not longer abort on assertions. But I don't know how to reproduce your actual error, can you please provide more specific steps on how to reproduce it?
@llunak Thanks for your reply!
The steps to reproduce are like:
/usr/sbin/iceccd -m 0 --no-remote -d -n test -N hostname -l /tmp/iceccd.log -vv
And then the sshd:
/usr/sbin/sshd -D
i can reproduce this, somewhat reliably. i'm compiling the linux kernel, with -j 30, using a few raspberry pis. here is a dump from valgrind (using the ubuntu 18 packaged icecream).
[iceccd: main.cpp:1266: bool Daemon::handle_job_done(Client*, JobDoneMsg*): Assertion `msg->job_id == cl->job_id' failed.
11290] 15:00:28: !wait_for_msg()
[11281] 15:00:28: timeout <= 0
==13614==
==13614== Process terminating with default action of signal 6 (SIGABRT): dumping core
[11290] 15:00:28: timeout <= 0
==13614== at 0x5DE4E97: raise (raise.c:51)
==13614== by 0x5DE6800: abort (abort.c:79)
==13614== by 0x5DD6399: __assert_fail_base (assert.c:92)
==13614== by 0x5DD6411: __assert_fail (assert.c:101)
==13614== by 0x113CCC: Daemon::handle_job_done(Client*, JobDoneMsg*) (main.cpp:1266)
==13614== by 0x119895: Daemon::handle_activity(Client*) (main.cpp:1685)
==13614== by 0x11A365: Daemon::answer_client_requests() (main.cpp:1935)
==13614== by 0x11AC89: Daemon::working_loop() (main.cpp:2027)
==13614== by 0x111D33: main (main.cpp:2360)
Sorry, but this is just too much work to reproduce locally. Please attach logs (with -vvv) from both the daemon and scheduler.
With -vvvv: main.cpp:1276: bool Daemon::handle_job_done(Client, JobDoneMsg): Assertion `msg->job_id == cl->job_id' failed.
It seems this doesn't happen when running daemon with "-m 0".
Tested with icecc 1.2-1 package in Ubuntu.