Open kunsjef opened 8 years ago
Sorry I need better alarms for my repositories issues, I was thinking about using again ts (and merging this repository with the upstream) and just noticed your issue.
Have you already checked with the upstream version 1.0?
The first 8 (the size of my queue) has PIDs, while the rest have separate JOBIDs
So, the first 8 are running jobs, the rest are just queued... 800~ not bad!
New_notifies
New_conns
seems to be connections to ts's socket, one for each job, maybe there's a limit in the number of connections? or open files for the process by the system (ulimit?).
Eventually if 800~ is a limit per socket a workaround could be to use more fifo (up to 8 with 1 job per queue), as mentioned in ts's home page:
Have any amount of queues identified by name, writting a simple wrapper script for each (I use ts2, tsio, tsprint, etc)
I run icinga2 with checker servers in a cluster that all run task-spooler to keep the load down during reloads and restarts of icinga2 (there is an open bug that makes the load sky rocket). Most of the time this runs without problems, but every now and then task-spooler starts logging errors to /tmp/socket-ts.108.error. They look like this:
What follows is a huge list (800+) of new jobs. The first 8 (the size of my queue) has PIDs, while the rest have separate JOBIDs, but no PIDs. After this long list of new jobs, this appears:
Also this is a long list. And then this repeats. The last time this happened, this repeated 8183 times in about 20 minutes. The log file was 2.3 GB. I detected this when free disk space was starting to be low on one of the checkers.
Also when this happens, task-spooler cannot limit the number of jobs it runs simultaneously. I have a limit of 8 jobs, but when this happens I can see hundreds of jobs running and hundreds of jobs in the queue. I can reproduce this error by restarting icinga2, generating a huge amount of jobs for TS to handle.
Can these errors be prevented, or is it possible to disable error-logging?