Closed mich181189 closed 7 years ago
I'm seeing the same here. On top of that, the gannt view shows a lot of jobs that start, but never end. These jobs are shown in white (not allocated to a host I think) and appear to be duplicates of the actual jobs that are compiled. It looks like an issue with the job administration.
The same shows in the list view, duplicate jobs with server "Unknown":
The strange thing is that the scheduler does not show the incorrect information in its telnet interface:
telnet 0 8766
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
200-ICECC 1.1rc2: 3845s uptime, 2 hosts, 0 jobs in queue (216 total).
200 Use 'help' for help and 'quit' to quit.
listcs
cave.local (192.168.178.220:10245) [x86_64] speed=382.00 jobs=0/2 load=199
danny-test.localdomain (192.168.178.91:10245) [x86_64] speed=365.86 jobs=0/2 load=2
200 done
listcs
cave.local (192.168.178.220:10245) [x86_64] speed=383.02 jobs=3/2 load=199
245 COMP sub:danny-test.localdomain on:cave.local icecream/client/remote.cpp
248 COMP sub:danny-test.localdomain on:cave.local icecream/client/util.cpp
250 WAIT sub:danny-test.localdomain on:cave.local icecream/client/md5.c
danny-test.localdomain (192.168.178.91:10245) [x86_64] speed=379.34 jobs=1/2 load=2
244 COMP sub:danny-test.localdomain on:danny-test.localdomain icecream/client/arg.cpp
200 done
listjobs
283 COMP sub:danny-test.localdomain on:cave.local icecream/daemon/main.cpp
284 COMP sub:danny-test.localdomain on:cave.local icecream/daemon/environment.cpp
286 WAIT sub:danny-test.localdomain on:cave.local icecream/daemon/load.cpp
288 WAIT sub:danny-test.localdomain on:danny-test.localdomain icecream/daemon/file_util.cpp
200 done
Right. This isn't an icemon bug, it's an icecc bug. https://github.com/icecc/icecream/commit/1c15f6b9c6ddd329e4fe1e2a03c7420a0407ae25 adds JobLocalBeginMsg calls for preprocessing sources, but does not add any JobLocalDoneMsg calls to match, so the JobLocalBeginMsg calls stack up as jobs that never complete.
I'll raise this as a bug on icecream.
My best guess is that it is not clearing up "finished" jobs.
Also note the "active jobs" count in the corner is very high ( > 4000)
If I get chance I'll take a look deeper but thought it was worth logging this first. it might be something to do with the fact I'm running icecc in a docker container (mounting the socket to the daemon in the container as a volume) so it perhaps isn't getting the messages correctly, though everything else seems to work, and a lot of jobs end up listed as "finished" on list view - which also gets very full