panda-official / TimeSwipe

PANDA Timeswipe driver and firmware
GNU General Public License v3.0
3 stars 5 forks source link

create real nanosecond sleep #11

Closed IngoKaiser closed 4 years ago

IngoKaiser commented 4 years ago

As a workaround we created a sleep function for 8 and 55ns that uses actions that last about that time, because any sleep function I found was not able to do a nanosecond sleep.

Please integrate according function. Lets discuss it, if there is any problem.

iluxa commented 4 years ago

RPi timer is not so precise, we can consider busy waiting or if this is not acceptable, some external device

IngoKaiser commented 4 years ago

hmm what's the precision of RPi? Busy waiting for those elements (its meant to be for gpio interaction breaks) might be an option otherwise.

iluxa commented 4 years ago

It has 1-microsecond precision, it is possible to use busy-waiting but it is not 100% guarantee because of operating system schedulers, timers and other interruptions if there is no external signaling, busy waiting can be a better solution

IngoKaiser commented 4 years ago

ok, can you make some tests about workload?

--> how long does reading take, what does it mean for cpu usage?

IngoKaiser commented 4 years ago

for old board - with sample rate of 16fps (Port 22) and new board - with sample rate of 48fps (Port 24)

IngoKaiser commented 4 years ago

Pins for new board

static const unsigned char DATA0 = 24; //BCM 24 - PIN 18 static const unsigned char DATA1 = 25; //BCM 25 - PIN 22 static const unsigned char DATA2 = 7; //BCM 7 - PIN 26 static const unsigned char DATA3 = 5; //BCM 5 - PIN 29
static const unsigned char DATA4 = 6; //BCM 6 - PIN 31 static const unsigned char DATA5 = 12; //BCM 12 - PIN 32 static const unsigned char DATA6 = 13; //BCM 13 - PIN 33 static const unsigned char DATA7 = 16; //BCM 16 - PIN 36

static const unsigned char CLOCK = 4; //BCM 4 - PIN 7 static const unsigned char TCO = 14; //BCM 14 - PIN 8 static const unsigned char PI_OK = 15; //BCM 15 - PIN 10 static const unsigned char FAIL = 18; //BCM 18 - PIN 12 static const unsigned char RESET = 17; //BCM 17 - PIN 11

you might bring new configuration to new branch, until we have completely changed to new boards

iluxa commented 4 years ago

old board with 100ms delay: cpu usage 25% 15740 fps

as fast as possible cpu usage 62% 38119 fps

iluxa commented 4 years ago

new board with 100ms delay: cpu usage 15% 16734 fps

as fast as possible: cpu usage 25% 27641 fps

iluxa commented 4 years ago

Thi issue with overfps is resolved by tweaking sleeps for different quartz in waitForPiOk driver.cpp function. I am going to try busy-wait sleep implementation

IngoKaiser commented 4 years ago

We’re mixing two things here. This task was about to replace sleep55ns function (Which is used between clocking) with better solution since we’re not sure if the gpio reading will also take 55ns in future (other os, driver, CPI...) that’s why we want to have a clean solution at that place.

On the other hand we are missing data on read (#15) but might be more an issue with piOK or waiting interval there.

iluxa commented 4 years ago

Here are investigations report about how to sleep 8ns and 55ns

Tests have been performed on Raspberri PI3 with raspbian 32bit on idle system

The idea is to perform some busy-loop:

volatile uint64_t n = N;
while(n > 0) {
    n--;
}

and estimate how much time it takes.

Time can be expressed in cpu core cycles, it must be the most precise method.

To count core cpu cycles kernel module was used: https://matthewarcus.wordpress.com/2018/01/27/using-the-cycle-counter-registers-on-the-raspberry-pi-3/ Kernel module enables access from user application to counter register

Each cpu core has dafault base frequency 600Mhz:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
600000

But this frequency when cpu core is idle.

Max frequency is 1,4 Ghz:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1400000

to make clear tests it is possible to setup cur frequency to the max with the command:

sudo cpupower frequency-set -g performance

During tests it was found that next piece of code:

volatile uint64_t n = 7;
while(n > 0) {
    n--;
}

works from 64 to 90 cpu core cycles, it is equal 45ns to 64ns

And the next piece of code:

volatile uint64_t n = 1;
while(n > 0) {
    n--;
}

works from 12 to 35 cpu core cycles, it is equal 8ns to 25ns

So, the output of the investigation: sleep55ns function can be replaced with busy loop which can be executed 45ns-64ns sleep8nsfunction can be replaced with busy loop which can be executed 8ns-25ns

IngoKaiser commented 4 years ago

ok, please

iluxa commented 4 years ago

PR: https://github.com/panda-official/TimeSwipe/pull/25

During tests I found that DataLogging is still getting 48K fps even with zero busy-wait (with uint64_t waitNs = 0;)

iluxa commented 4 years ago

During tests it was found that only one delay required got clean diagram: between setGPIOHigh(CLOCK) and setGPIOLow(CLOCK); Actually puting std::this_thread::yield(); call there enough to receive clean data, so only context switch between threads required. With busy loop like:

volatile uint64_t cycles = N;
        while(cycles > 0) {
            cycles--;
        }

and big N value there is a very high probability that the execution will be interrupted by OS scheduler many times. And small about is not enough to wait. readAllGPIO does same context switching because it needs to read mmap memory from the kernel.

iluxa commented 4 years ago

Latest tested commit pushed to the branch and ready to merge

iluxa commented 4 years ago

According to the topic https://www.raspberrypi.org/forums/viewtopic.php?t=228727 after next improvements:

according to the tests it is possible to keep sleep in 60ns in 99,994% cases, on the rest 0,006% cases maximum sleep is not more than 45us

IngoKaiser commented 4 years ago

Since its needed to access os for that we should stick to current implementation. Busy wait would increase cpu workload if I got it right. We might get to this later again - @iluxa undo pr please.