Open ceeaspb opened 8 years ago
There's 2 problems here:
@nitsanw 1) it looks like SIGPROF will only ever fire when a thread is on cpu. It counts down as user and system cpu time elapses. So if the process has no threads on cpu then the timer will not expire and a signal not sent when off cpu. So I don't think we can "miss" a SIGPROF interrupt so to speak (would be good to check). They may just never occur if the whole jvm is off cpu forever.
Other timers: SIGALRM - counts down real time SIGVTALRM - counts down user cpu time and not system cpu time. Which makes me think of something else but I think my ticket quota today is exceeded.
2) agree this ticket isn't about failed samples.
probably needs more thought. the risk is you are led to optimise for cpu when you should be looking at off cpu time (in which case you need a different tool).
From what you say I take that the time will expire, but the SIGPROF will get swallowed. This matches my observations looking at HP logs. But I would need to experiment more to validate. Discovering the number of actual samples vs. wanted samples is easy by logging the timer thread counter of signals emitted and comparing to number of log entries.
This implies profiler stop sequence should be terminate timer, drain logger, log number of timer emitted signals.
Rephrasing title to reflect discussion
@ceeaspb reading the code more closely it looks like I misunderstood how the notification timer hangs together. In particular we rely on: http://man7.org/linux/man-pages/man2/setitimer.2.html
ITIMER_VIRTUAL decrements only when the process is executing, and
delivers SIGVTALRM upon expiration.
ITIMER_PROF decrements both when the process executes and when the
system is executing on behalf of the process. Coupled
with ITIMER_VIRTUAL, this timer is usually used to
profile the time spent by the application in user and
kernel space. SIGPROF is delivered upon expiration.
So if no CPU is used (user or system) no signal will be delivered. This does not mean we cannot follow through on this feature. An estimate of intended sample rate should be sufficient as an indicator.
Low number of samples likely reflects low cpu usage or a short profiling duration, etc.
But by reporting the cpu cost as a percentage we could lose sight of this (low sample number).
This warning is presented in other sampling profilers (ie. typical / covers off a profiling risk).
How many samples is low? how long is a piece of string? TBC