icecc / icemon

Icecream GUI Monitor
http://kfunk.org/tag/icemon/
GNU General Public License v2.0
92 stars 35 forks source link

Star view doesn't seem to be cleaning up properly #33

Closed mich181189 closed 7 years ago

mich181189 commented 7 years ago

My best guess is that it is not clearing up "finished" jobs.

icemon-massive

Also note the "active jobs" count in the corner is very high ( > 4000)

If I get chance I'll take a look deeper but thought it was worth logging this first. it might be something to do with the fact I'm running icecc in a docker container (mounting the socket to the daemon in the container as a volume) so it perhaps isn't getting the messages correctly, though everything else seems to work, and a lot of jobs end up listed as "finished" on list view - which also gets very full

danny-smit commented 7 years ago

I'm seeing the same here. On top of that, the gannt view shows a lot of jobs that start, but never end. These jobs are shown in white (not allocated to a host I think) and appear to be duplicates of the actual jobs that are compiled. It looks like an issue with the job administration.

The same shows in the list view, duplicate jobs with server "Unknown":

screenshot_20170729_172154

The strange thing is that the scheduler does not show the incorrect information in its telnet interface:

telnet 0 8766  
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
200-ICECC 1.1rc2: 3845s uptime, 2 hosts, 0 jobs in queue (216 total).
200 Use 'help' for help and 'quit' to quit.
listcs
 cave.local (192.168.178.220:10245) [x86_64] speed=382.00 jobs=0/2 load=199
 danny-test.localdomain (192.168.178.91:10245) [x86_64] speed=365.86 jobs=0/2 load=2
200 done
listcs
 cave.local (192.168.178.220:10245) [x86_64] speed=383.02 jobs=3/2 load=199
   245 COMP sub:danny-test.localdomain on:cave.local icecream/client/remote.cpp
   248 COMP sub:danny-test.localdomain on:cave.local icecream/client/util.cpp
   250 WAIT sub:danny-test.localdomain on:cave.local icecream/client/md5.c
 danny-test.localdomain (192.168.178.91:10245) [x86_64] speed=379.34 jobs=1/2 load=2
   244 COMP sub:danny-test.localdomain on:danny-test.localdomain icecream/client/arg.cpp
200 done
listjobs
 283 COMP sub:danny-test.localdomain on:cave.local icecream/daemon/main.cpp
 284 COMP sub:danny-test.localdomain on:cave.local icecream/daemon/environment.cpp
 286 WAIT sub:danny-test.localdomain on:cave.local icecream/daemon/load.cpp
 288 WAIT sub:danny-test.localdomain on:danny-test.localdomain icecream/daemon/file_util.cpp
200 done
mich181189 commented 7 years ago

Right. This isn't an icemon bug, it's an icecc bug. https://github.com/icecc/icecream/commit/1c15f6b9c6ddd329e4fe1e2a03c7420a0407ae25 adds JobLocalBeginMsg calls for preprocessing sources, but does not add any JobLocalDoneMsg calls to match, so the JobLocalBeginMsg calls stack up as jobs that never complete.

I'll raise this as a bug on icecream.