suggestion: add a column showing max execution time for a single call

robertsdotpm commented 2 years ago

This tool is really useful but it would be great if it also showed what the highest recorded execution time was for a measured call when a call was made multiple times. This would basically point to bottlenecks. It has the average and total at the moment. (I tried already to modify the code to add this feature before posting this but I don't understand the code well enough yet.)

sumerc commented 2 years ago

Ok. This could be done in code level but I don't think it would make sense to try adding such column to the output we generate.(get_func_stats().print_all())

robertsdotpm commented 2 years ago

@sumerc Would also be cool to provide a special return value that could be used to turn off counting running a particular call. Like say you detect an error condition and you don't want it to count towards the average run time collected -- you return the constant and the profiler ignores that run of the function. Going to have another crack at extending this tomorrow.

sumerc commented 2 years ago

you return the constant and the profiler ignores that run of the function

I could not understand the proposal here. A concrete example might be better?

robertsdotpm commented 2 years ago

you return the constant and the profiler ignores that run of the function

I could not understand the proposal here. A concrete example might be better?

You would do something like:

def code_to_profile(): if error_occured: return yappi.SKIP_PROFILING

then in the C code hooks for leave (still studying the code): you would check if a function returned that value and if it did -- quit doing any kind of timing for that function run. The idea is that you could use this to time operations that may have inconsistent results where you're only interested in the run time of the most typical case.

For example: if you were benchmarking UDP code you might be interested to know how 'fast' your code can do send and recv / if there's any bottlenecks. So you would write code that does that -- however if a UDP packet gets lost then the function will timeout and artificially increase the average perceived cost of the call. It would be very useful to then say:

if no reply ... return yappi.SKIP_PROFILING to ignore that run.

robertsdotpm commented 1 year ago

I still think this is a beautiful program, btw @sumerc Respect to all the devs who have worked on this.

sumerc commented 1 year ago

I still think this is a beautiful program, btw @sumerc Respect to all the devs who have worked on this.

Thank you. I think I have overlooked your previous comment.

For example: if you were benchmarking UDP code you might be interested to know how 'fast' your code can do send and recv / if there's any bottlenecks. So you would write code that does that -- however if a UDP packet gets lost then the function will timeout and artificially increase the average perceived cost of the call. It would be very useful to then say:

I see a valid use case here, but not sure if yappi is the right tool for the described scenario here. I would suggest using a line level profiler for this, there are even sampling line profilers available (scalene) if overhead is a problem. Moreover, I am usually reluctant to changes that require manual user intervention. Here, user needs to explicitly change code to enable/disable certain profiling features. IMHO, a profiler should be able to work with the least amount of code changes.

sumerc / yappi

suggestion: add a column showing max execution time for a single call #105