Closed sumerc closed 4 years ago
Hi - thanks for your note!
I guess I am not sure what you would like to appear here - I do think it’s reasonable to use the default settings for tools, given that that is the expected common use case. That said, I can include both results. I observe in passing that Scalene uses CPU time. I do wonder why reading the high performance timer (which I assume is nothing but a shim over something like RDTSC) is expensive: it should in principle be far cheaper than a system call to get elapsed virtual time. FWIW Scalene uses virtual (scheduled) elapsed CPU time. Wall clock time is pretty noisy.
Hi,
I guess I am not sure what you would like to appear here - I do think it’s reasonable to use the default settings for tools, given that that is the expected common use case. That said, I can include both results.
Yes I agree it is reasonable to use default settings, but on this occasion it gives a false impression like Yappi does have 20x! overhead compared to cProfile measuring the same thing(but it is not measuring the same thing) :) That is why I kindly asked if it possible to add wall-time results. It would be great if you can include both results, too.:)
I do wonder why reading the high performance timer (which I assume is nothing but a shim over something like RDTSC) is expensive: it should in principle be far cheaper than a system call to get elapsed virtual time. FWIW Scalene uses virtual (scheduled) elapsed CPU time
I remember looking at this problem years ago lurking the Linux kernel for clues. I really do not remember the details quite much. I will also be looking over your solution too. Thanks!
UPDATE: I can see that you use process_time()
which is equivalent to (AFAIR) clock_gettime(CLOCK_PROCESS_CPUTIME_ID)
. Yappi uses clock_gettime(CLOCK_THREAD_CPUTIME_ID)
as it is trying to get per-thread CPU time. Both these APIs seem to add some overhead.
I also have written a simple application. See:
#include "time.h"
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
struct timeval curtime;
struct timespec tp;
int main(int argc, char **argv)
{
int iterations = atoi(argv[1]);
int clk_type = atoi(argv[2]);
for (int i = 0; i < iterations; i++)
{
switch(clk_type) {
case 1:
gettimeofday(&curtime, NULL);
break;
case 2:
clock_gettime(CLOCK_MONOTONIC, &tp);
break;
case 3:
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tp);
break;
case 4:
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);
break;
}
}
return 0;
}
/*
playground » time ./c1 10000000 1 ~/Desktop/p/playground 130 ↵
./c1 10000000 1 0,16s user 0,00s system 99% cpu 0,156 total
playground » time ./c1 10000000 2 ~/Desktop/p/playground
./c1 10000000 2 0,19s user 0,00s system 99% cpu 0,189 total
playground » time ./c1 10000000 3 ~/Desktop/p/playground
./c1 10000000 3 2,46s user 1,79s system 99% cpu 4,255 total
playground » time ./c1 10000000 4 ~/Desktop/p/playground
./c1 10000000 4 2,50s user 1,76s system 99% cpu 4,260 total
*/
You can see it adds 15-20x overhead. I am assuming the answer lies in the kernel. I will be looking into that.
Updated results now up via https://github.com/emeryberger/scalene/commit/a7afa197e5e3183e2ff9ba1e8ed36cd68bb8dce1.
Hi Emery,
I am author of Yappi and I have seen your work by chance. It seems like a nice project.
I have a simple request if possible: mainpage shows Yappi like 18x slowdown whereas cProfile is 2x, but the real issue is that: Yappi profiles CPU time by default and cProfile profiles Wall time by default. This makes 10x difference since reading HW clocks is an expensive operation and I think to make comparison fair, I would kindly request you to profile the same application(julia_nogil.py) with setting clock_type to wall. Here is the command I used to do that and the results indicate only 2x-2.5x slowdown roughly similar to cProfile(which should be the case). Currently it seems like it is slower even than the
line_profiler
:)Thanks!