Closed IngoKaiser closed 4 years ago
RPi timer is not so precise, we can consider busy waiting or if this is not acceptable, some external device
hmm what's the precision of RPi? Busy waiting for those elements (its meant to be for gpio interaction breaks) might be an option otherwise.
It has 1-microsecond precision, it is possible to use busy-waiting but it is not 100% guarantee because of operating system schedulers, timers and other interruptions if there is no external signaling, busy waiting can be a better solution
ok, can you make some tests about workload?
--> how long does reading take, what does it mean for cpu usage?
for old board - with sample rate of 16fps (Port 22) and new board - with sample rate of 48fps (Port 24)
Pins for new board
static const unsigned char DATA0 = 24; //BCM 24 - PIN 18
static const unsigned char DATA1 = 25; //BCM 25 - PIN 22
static const unsigned char DATA2 = 7; //BCM 7 - PIN 26
static const unsigned char DATA3 = 5; //BCM 5 - PIN 29
static const unsigned char DATA4 = 6; //BCM 6 - PIN 31
static const unsigned char DATA5 = 12; //BCM 12 - PIN 32
static const unsigned char DATA6 = 13; //BCM 13 - PIN 33
static const unsigned char DATA7 = 16; //BCM 16 - PIN 36
static const unsigned char CLOCK = 4; //BCM 4 - PIN 7 static const unsigned char TCO = 14; //BCM 14 - PIN 8 static const unsigned char PI_OK = 15; //BCM 15 - PIN 10 static const unsigned char FAIL = 18; //BCM 18 - PIN 12 static const unsigned char RESET = 17; //BCM 17 - PIN 11
you might bring new configuration to new branch, until we have completely changed to new boards
old board with 100ms delay: cpu usage 25% 15740 fps
as fast as possible cpu usage 62% 38119 fps
new board with 100ms delay: cpu usage 15% 16734 fps
as fast as possible: cpu usage 25% 27641 fps
Thi issue with overfps is resolved by tweaking sleeps for different quartz in waitForPiOk
driver.cpp function.
I am going to try busy-wait sleep implementation
We’re mixing two things here. This task was about to replace sleep55ns function (Which is used between clocking) with better solution since we’re not sure if the gpio reading will also take 55ns in future (other os, driver, CPI...) that’s why we want to have a clean solution at that place.
On the other hand we are missing data on read (#15) but might be more an issue with piOK or waiting interval there.
Here are investigations report about how to sleep 8ns and 55ns
Tests have been performed on Raspberri PI3 with raspbian 32bit on idle system
The idea is to perform some busy-loop:
volatile uint64_t n = N;
while(n > 0) {
n--;
}
and estimate how much time it takes.
Time can be expressed in cpu core cycles, it must be the most precise method.
To count core cpu cycles kernel module was used: https://matthewarcus.wordpress.com/2018/01/27/using-the-cycle-counter-registers-on-the-raspberry-pi-3/ Kernel module enables access from user application to counter register
Each cpu core has dafault base frequency 600Mhz:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
600000
But this frequency when cpu core is idle.
Max frequency is 1,4 Ghz:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1400000
to make clear tests it is possible to setup cur frequency to the max with the command:
sudo cpupower frequency-set -g performance
During tests it was found that next piece of code:
volatile uint64_t n = 7;
while(n > 0) {
n--;
}
works from 64 to 90 cpu core cycles, it is equal 45ns to 64ns
And the next piece of code:
volatile uint64_t n = 1;
while(n > 0) {
n--;
}
works from 12 to 35 cpu core cycles, it is equal 8ns to 25ns
So, the output of the investigation:
sleep55ns
function can be replaced with busy loop which can be executed 45ns-64ns
sleep8ns
function can be replaced with busy loop which can be executed 8ns-25ns
ok, please
PR: https://github.com/panda-official/TimeSwipe/pull/25
During tests I found that DataLogging is still getting 48K fps even with zero busy-wait (with uint64_t waitNs = 0;
)
During tests it was found that only one delay required got clean diagram: between setGPIOHigh(CLOCK)
and setGPIOLow(CLOCK);
Actually puting std::this_thread::yield();
call there enough to receive clean data, so only context switch between threads required.
With busy loop like:
volatile uint64_t cycles = N;
while(cycles > 0) {
cycles--;
}
and big N value there is a very high probability that the execution will be interrupted by OS scheduler many times. And small about is not enough to wait.
readAllGPIO
does same context switching because it needs to read mmap memory from the kernel.
Latest tested commit pushed to the branch and ready to merge
According to the topic https://www.raspberrypi.org/forums/viewtopic.php?t=228727 after next improvements:
isolcpus=3
boot argument to /boot/cmdline.txtaccording to the tests it is possible to keep sleep in 60ns in 99,994% cases, on the rest 0,006% cases maximum sleep is not more than 45us
Since its needed to access os for that we should stick to current implementation. Busy wait would increase cpu workload if I got it right. We might get to this later again - @iluxa undo pr please.
As a workaround we created a sleep function for 8 and 55ns that uses actions that last about that time, because any sleep function I found was not able to do a nanosecond sleep.
Please integrate according function. Lets discuss it, if there is any problem.