Open fl4p opened 8 years ago
Le 12 f�vr. 2016 � 21:26, Fabian notifications@github.com a �crit :
Why are you moving away from the cycle counters? clock_gettime is damn slow, as this CPU profile shows (ARM embedded system):
I noticed a significant decrease in JACK's CPU load when using hardware-provided ARM high precision counters.
Any patch so show?
Also I think the CalcCPULoad() adds some extra overhead, if not client is using querying the CPU load value. This should be on-demand, or at least with a configure-switch.
What do you think?
I don't think this make real sense.
I think this is a better question for the JACK mailing list. Note that both jack1 and jack2 removed cycle counters as a timing option, not just jack2.
How big is this "significant decrease"?
This is the High Precision counter code:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
static long long int *arm_hpet_ptr;
static int arm_hpet_init ()
{
int fd;
void *st_base;
if (-1 == (fd = open("/dev/mem", O_RDONLY))) {
printf ("Cannot access /dev/mem (%s)\n", strerror (errno));
return -1;
}
if (MAP_FAILED == (st_base = mmap(NULL, 4096,
PROT_READ, MAP_SHARED, fd, ARM_HPET_ST_BASE))) {
printf ("mmap() failed.\n");
return -1;
}
arm_hpet_ptr = (long long int *)((char *)st_base + ARM_HPET_TIMER_OFFSET);
return 0;
}
static uint64_t cycles_arm_hpet (void)
{
static int init = 1;
if(init) {arm_hpet_init (); init = 0; }
return *arm_hpet_ptr; // 1mhz counter => 1µs cycle
}
Addresses for the Raspberry Pi 1 and 2:
#define BCM2708_PERI_BASE 0x20000000 // rpi1
#define BCM2709_PERI_BASE 0x3F000000 // rpi2
#define ARM_PERI_BASE BCM2709_PERI_BASE // choose rpi2
#define ARM_HPET_ST_BASE (ARM_PERI_BASE + 0x3000)
#define ARM_HPET_TIMER_OFFSET (4)
#define ARM_HPET_TIMER_RATE 1000000
The cycles_arm_hpet
can just be placed as _jack_get_microseconds
.
I was benchmarking timers a bit and noticed the gettimeofday is twice as fast as clock_gettime:
gettimeofday 10000000x took 2002.2 ms, 200.2 ns/call
clock_gettime 10000000x took 4274.5 ms, 427.5 ns/call
cycles_arm_hpet 10000000x took 1943.5 ms, 194.4 ns/call
cycles_arm7 10000000x took 327.3 ms, 32.7 ns/call
gettimeofday
is as accurate as clock_gettime
on the Raspberry Pi 2. I will get back with some hopefully meaningful benchmarks on CPU load in jack2.
forgot about this...
I did some testing of my own. The cycle counter is only useful if you use jack in a single cpu. The counter for each CPU is not in sync, and due to jack2 using SMP the thread that calls get_cycles might change at anytime.
I agree the cycle counter uses less resources, but it's not possible to use in SMP systems.
Why are you moving away from the cycle counters?
clock_gettime
is damn slow, as this CPU profile shows (ARM embedded system): I noticed a significant decrease in JACK's CPU load when using hardware-provided ARM high precision counters.Also I think the CalcCPULoad() adds some extra overhead (in the profile too), if no client is querying the CPU load value. This should be on-demand, or at least with a configure-switch.
What do you think?