Don't fail on first error

uber-archive / pyflame

🔥 Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.

Apache License 2.0

2.98k stars 240 forks source link

Don't fail on first error #138

Closed stevenkaras closed 6 years ago

stevenkaras commented 6 years ago

When profiling a long lived multithreaded python process, sometimes the stack can't be read correctly.

This change simply handles such errors by logging the failure to stderr, and continues sampling.

Doesn't solve the underlying issue from #129, but it works around such failures (under the assumption it's ok to lose a handful of samples out of a few thousand)

CLAassistant commented 6 years ago

All committers have signed the CLA.

eklitzke commented 6 years ago

Apologies for the delay on looking at this. In general I am kind of hesitant about these types of changes because it can lead to a situation where you just fail in a tight loop. Is this really just a rare failure, or does it keep happening?

stevenkaras commented 6 years ago

I am able to reproduce the issue on a regular basis. It typically happens every few minutes when sampling over a long period of time. I agree that if the first cycle fails it probably indicates a real issue, so perhaps some logic to fail loudly for that case?

A better approach would be to include these as "failed" samples in the profiling data.

eklitzke commented 6 years ago

In that case your suggestion of reporting them as failed makes sense, I would accept that. Should be pretty easy to do, see how idle is reported. I also updated Travis last night so the tests should pass if you get the latest changes in master.

eklitzke commented 6 years ago

I was thinking about doing a small release (v1.6.4) to include #144, as I've gotten a number of complaints about the default sample rate setting. I will hold off on that for a bit so we can get this change in as well.

stevenkaras commented 6 years ago

I set it up to log both to the summary as (failed), and when dumping with timestamps as (failed). I don't use chrome's cpu profiler, so I'd prefer to leave it to someone else to sort out that integration.

eklitzke commented 6 years ago

Thanks for this, I will create a new release with this change.