Closed stevenkaras closed 6 years ago
Apologies for the delay on looking at this. In general I am kind of hesitant about these types of changes because it can lead to a situation where you just fail in a tight loop. Is this really just a rare failure, or does it keep happening?
I am able to reproduce the issue on a regular basis. It typically happens every few minutes when sampling over a long period of time. I agree that if the first cycle fails it probably indicates a real issue, so perhaps some logic to fail loudly for that case?
A better approach would be to include these as "failed" samples in the profiling data.
In that case your suggestion of reporting them as failed makes sense, I would accept that. Should be pretty easy to do, see how idle is reported. I also updated Travis last night so the tests should pass if you get the latest changes in master.
I was thinking about doing a small release (v1.6.4) to include #144, as I've gotten a number of complaints about the default sample rate setting. I will hold off on that for a bit so we can get this change in as well.
I set it up to log both to the summary as (failed)
, and when dumping with timestamps as (failed)
. I don't use chrome's cpu profiler, so I'd prefer to leave it to someone else to sort out that integration.
Thanks for this, I will create a new release with this change.
When profiling a long lived multithreaded python process, sometimes the stack can't be read correctly.
This change simply handles such errors by logging the failure to stderr, and continues sampling.
Doesn't solve the underlying issue from #129, but it works around such failures (under the assumption it's ok to lose a handful of samples out of a few thousand)