xtuml / munin

Apache License 2.0
1 stars 0 forks source link

Enhancing reporting of job timeout conditions #165

Open ColinCarterUK opened 7 months ago

ColinCarterUK commented 7 months ago

If a job times out the PV simply reports that the job has timed out. However, that means the job was waiting for one or more things to happen. There are numerous conditions that the job may have been in at the point where it timed out. If these were enumerated (or even the most likely ones enumerated) then it may be possible to enhance the information reported with a timeout. Candidate conditions include:

  1. A previous event id has been seen but not its corresponding event
  2. The branch extent is non-zero
  3. An AND fork that has been touched is incomplete
  4. Expected end events not seen

There may be others and some scenarios may impact one or more of the above.