Closed rkompass closed 7 months ago
Just for fun: Adding a us timestamp to Pin.irq() and feeding the device with an external 10ms pulse, I get the following result with a modified timertest at a W600. That's what could be expected:
i_now: 403
tdif[0:3]: [-10045, -3566, 1]
Deviations (us) lowest: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Deviations (us) highest: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Using ticks_us() in the handler I get:
i_now: 403
tdif[0:3]: [-9927, 2104073, -10]
Deviations (us) lowest: [-11, -11, -11, -11, -11, -11, -11, -11, -11, -11]
Deviations (us) highest: [10, 10, 10, 10, 10, 11, 11, 11, 12, 13]
This is nice! Do you have the MP code?
Just updated.
I mean, how to read this timestamp from the MP interpreter.
Perfect solution for a Non blocking HC-SR04 reading for example. !!!
Thinking it over: Could the timestamp be guarded against pin value bounce, so that a flag is set and the timestamp cannot be updated by later same interrupts, until the ISR reads it?
Of course: Give me a little finger -> I want the whole Hand :-)
The timestamp can be read with pin.timestamp()
, where pin
is the pin object, which is also supplied to the ISR as argument.
Blocking: possible, but maybe not as useful as it might look at first glance. To make it robust you must then read the value. Otherwise the timestamp it that of the last recognized interrupt. The interrupt is cleared at the end of the ISR. So any retrigger happening while the ISR handler is active will be lost anyhow.
b.t.w. As part of the last push I added the tick_hz argument to Timer, to make it compatible. Not that I assume this argument as needed.
I see (found it overlappingly): There is a Pin.timestamp() method.
Did you measure, how many bytes that did cost? I was so eager and have the old compilation deleted by now.
Did you measure, how many bytes that did cost?
64 Bytes.
I think I'll change that and add timestamp to the irq object, like flags and trigger.
Blocking: possible, but maybe not as useful as it might look at first glance. To make it robust you must then read the value. Otherwise the timestamp it that of the last recognized interrupt. The interrupt is cleared at the end of the ISR. So any retrigger happening while the ISR handler is active will be lost anyhow.
I see (read the isr hander code): It works with hard interrupts and there is no chance that the timestamp is overwritten, until the MP callback can have read it. If it doesn't read, of course, it might be overwritten, as after the callback is finished the interrupts are restored. Is that correct?
Now a question would be what the true latency of the hard interrupt timestamp is. Can be measured with continuous polling against ticks_ns()
.
This is great in general, as it is easily extended to the other ports. As for the perfect naming (as this Pin method will be available everywhere): How about:
?
I think I'll change that and add timestamp to the irq object, like flags and trigger.
Now I'm curious what you mean (general idea clear, but the detail..?). Certainly a way to get a safe first timestamp even with soft interrupts. Which we exclusively have on some architectures like the esp8266, for example.
Now I'm curious what you mean
Just a style change, but not possible until the mainline code is changed. The mp_irq class itself is defined in shared/mpirq.*. So got now the call to retrieve the timestamp has to stay at the UART object.
I considered names like these, but I do not like long names. And there are no better synonyms. The best of the alternatives seemed time_tag. Maybe irq_tick_us
.
p.irq_tick_us()
seems fine. p.timestamp()
would be best unless the unit was missing, which I consider as important.
So if p.timestamp_us()
is too long, p.irq_tick_us()
is second best.?. (it is just 1 character shorter). p.irq_t_us()
?
It would be nice, of course, if the timestamp would be available for all interrupts.
We can go for timestamp_us
.
A constant value may be subtracted from the timestamp, as the time measurement in the ISR takes a deterministic time (only dependent on CPU frequency).
# version to test the new timestamp method
from machine import Pin, Timer
from time import ticks_us, ticks_diff
from array import array
from sys import platform
if platform == 'rp2':
led = Pin("LED", Pin.OUT, value=1)
elif platform == 'w600':
led = Pin(0, Pin.OUT, value=1) # LED on Winner w600
tim = Timer(-1) # Software timer
N_Runs = const(1003) # 5 seconds total
Period_ms = 5 # no flicker at 200 Hz (not recognizable)
i_int, i_now = 0, 0
t_start, t_stop = 0, 0
tdif = array('I',(0 for _ in range(N_Runs)))
@micropython.viper
def pin_start(t):
global t_start
ledp = ptr32(0x40010C00)
t_start = ticks_us()
ledp[0] = 0 # all pin values of bank A = low, was: led(0)
@micropython.viper
def pin_isr(p):
global t_stop, i_int
# t_stop = ticks_us()
t_stop = p.timestamp()
led(1)
i = int(i_int)
if i < N_Runs:
tdif[i] = ticks_diff(t_stop, t_start)
i += 1
i_int = i
led.irq(trigger=Pin.IRQ_FALLING, handler=pin_isr, hard=True)
print('Pin {}: Set up hard interrupt.'.format(led))
tim.init(period=5, mode=Timer.PERIODIC, callback=pin_start)
try:
while i_now < N_Runs:
if i_int > i_now:
if i_now % 50 == 49: # prevent longer USB inactivity (problematic on mimxrt)
print('.', end='')
i_now += 1
finally:
tim.deinit()
print('\nRuns:', i_now, 'tdif[0:3]:', tdif[0:3])
tdif = list(tdif)
tdif[0:2] = []
tdif.sort()
print('Deviations (us) lowest:', tdif[0:10])
print('Deviations (us) highest:', tdif[-11:-1])
gives
Runs: 1003 tdif[0:3]: array('I', [32, 20, 16])
Deviations (us) lowest: [16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
Deviations (us) highest: [24, 24, 24, 24, 24, 24, 24, 24, 24, 24]
and the
t_start = ticks_us()
ledp[0] = 0
certainly takes a few us,
And in the handler:
parent->mp_irq_timestamp = mp_hal_ticks_us();
may be put before
parent->mp_irq_flags = tls_get_gpio_irq_status(parent->id);
?
optimizing ....
My optimization:
static void machine_pin_isr_handler(void *arg) {
mp_uint_t timestamp = mp_hal_ticks_us();
mp_irq_obj_t *self = arg;
machine_pin_obj_t *parent = MP_OBJ_TO_PTR(self->parent);
if (self->handler != mp_const_none) {
parent->mp_irq_flags = tls_get_gpio_irq_status(parent->id);
parent->mp_irq_timestamp = timestamp;
The above test now gives
Runs: 1003 tdif[0:3]: array('I', [26, 19, 16])
Deviations (us) lowest: [15, 15, 15, 15, 15, 15, 15, 15, 15, 15]
Deviations (us) highest: [20, 20, 20, 20, 20, 20, 20, 20, 20, 20]
Jitter is down to 5 us. What is the constant value of the interrupt -> ISR (including mp_hal_ticks_us()) ???
Taking the timestamp as first action is a good suggestion.
A constant value may be subtracted from the timestamp, as the time measurement in the ISR takes a deterministic time (only dependent on CPU frequency).
I do not know the lowest value of the interrupt response time. I could toggle a pin in the IRQ handler by direct port access and measure with an oscilloscope. Then we get a value close to it. I can do that. It may however be much longer, if e.g. if a higher priority interrupt is active or the IRQ handler code is not cached in the code execution cache. Then it has to be loaded from the serial SPI flash, which takes a while.
I renamed pin.timestamp to pin.timestamp_us, as suggested.
I do not know how you triggered the Pin IRQ. I used an external signal generator, which by chance matched well. One can as well use the chip's PWM. Then there are not clock difference.
Good morning,
my understanding of C is so that
static void machine_pin_isr_handler(void *arg) {
..
mp_irq_obj_t *self = arg;
can be changed for
static void machine_pin_isr_handler(void *self) {
In Platform/Drivers/gpio/wm_gpio.c
void GPIOA_IRQHandler(void)
{
u8 i = 0;
u8 found = 0;
u32 reg = 0;
reg = tls_reg_read32(HR_GPIO_MIS);
for (i = 0; i <= WM_IO_PA_15; i++)
{
if (reg & BIT(i))
{
found = 1;
break;
}
}
if (found)
{
if (NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
}
return;
}
is the first place where the timestamp could be taken.
Before the loop which takes time dependent of the number of pin in the bank....
Using:
static void machine_pin_isr_handler(void *arg) {
..
mp_irq_obj_t *self = arg;
will not generate extra code, because the compiler is clever enough to just take the value of arg
for self
without making an assignment. BUT, when defining self
as void *
, something like self->parent
or self->handler
is not possible.
About moving the timestamp reading to the low level IRQ handler: that does not look right. Taking the value is one thing, but is has to be forwarded to the MP IRQ handler. So either a) the arg argument has to be opened
, given the machine_pin_obj_t
type and timestamp has to be assigned to the respective element, or
b) timestamp has to be added as a second argument to the callback.
Option a) is bad, because you have to mangle the SDK name space with the MicroPython namespace. Option b) also breaks the API convention.
It's easier just to tell people, that for fastest response they should use PA0 as pin for IRQ.
Anyway, the same program with Pin(15) instad of Pin(0) for the led leads to
Deviations (us) lowest: [17, 17, 17, 17, 17, 17, 17, 17, 17, 17]
Deviations (us) highest: [22, 22, 22, 22, 22, 22, 22, 22, 22, 22]
so the looping in the interrupt handler takes up to 2 us.
Yes, things are not perfect. We have to accept that.?. Not making the port more complicated by fuzzing around in the SDK. Or should we optimize the loop at least?
I assume BIT(i)
is 1<<i
.
I made a few measurement by toggling a pin in the IRQ handler:
At PA0, the delay between the external trigger slope and the echo signal is between 1 and 2.2µs, with 1.2µs as the average. Taking the timestamp takes between 2.3 and 11 µs. At PB18, the delay between the external trigger slope and the echo signal is between 6 and 18µs, with 6.3µs as the average. Taking the timestamp takes between 2.3 and 11 µs.
I could not take the time it takes to toggle the pin. The code is too fast for the bus logic to follow. So its far below 1 µs. The yellow trace below is the trace of the toggled pin with persistence enabled.
Using PA0
Using PB18
I assume BIT(i) is 1<<i.
This and the loop in GPIOA_IRQHandler(void)
is the "nondeterministic" part of the timestamp latency, which now might easily be adjusted. <-- I thought.
Taking the timestamp takes between 2.3 and 11 µs.
Oh, that's puzzling. A much bigger source of jitter than one would expect. How can that be? There must be other interrupts with higher priority interfering (the tick??).
Do we agree that it makes sense to sort this out? That the timestamp_us() should be there, as there is plenty of use to it, especially if it's quite precise?
if (reg & 0xffff0000) i+= 16;
if (reg & 0xff00ff00) i+= 8;
if (reg & 0xf0f0f0f0) i+= 4;
if (reg & 0xcccccccc) i+= 2;
if (reg & 0xaaaaaaaa) i+= 1;
or, perhaps more deterministic (constant number of operations):
i += (reg & 0xaaaaaaaa) ? 1 : 0;
i += (reg & 0xcccccccc) ? 2 : 0;
i += (reg & 0xf0f0f0f0) ? 4 : 0;
i += (reg & 0xff00ff00) ? 8 : 0;
i += (reg & 0xffff0000) ? 16 : 0;
comes into mind.
but that doesn't solve
Taking the timestamp takes between 2.3 and 11 µs.
Oh, that's puzzling. A much bigger source of jitter than one would expect. How can that be?
ticks_us() is not just reading a register. It shares the counter with the Watchdog timer, which runs at 40MHz. Code below. The division by 40 creates the large jitter. Could be avoided the WDT could run at 1 MHz. Then taking the timestamp needs less than 600 ns.
uint64_t mp_hal_ticks_us64(void) {
return (ticks_total + (ticks_reload_value - tls_reg_read32(HR_WDG_CUR_VALUE))) / ticks_per_us;
}
So the division is the culprit! How about running the WDT at a power of two and use a shift instead of divide. How about 32 MHz?
Seems like the WDT can only run at the undivided ABP clock. No prescaler. As an alternative one of the hardware timers can be used for ticks_us(). That would reduce the number of available hard timers for machine.Timer to 4.
There must be something fast for the division by 5. Division by 8 is a shift.
Meanwhile I changed:
void GPIOA_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS) & 0xffff;
u8 i = (reg & 0xaaaa) ? 1 : 0;
i += (reg & 0xcccc) ? 2 : 0;
i += (reg & 0xf0f0) ? 4 : 0;
i += (reg & 0xff00) ? 8 : 0;
if (reg)
if (NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
void GPIOB_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS + TLS_IO_AB_OFFSET);
u8 i = WM_IO_PB_00;
i += (reg & 0xaaaaaaaa) ? 1 : 0;
i += (reg & 0xcccccccc) ? 2 : 0;
i += (reg & 0xf0f0f0f0) ? 4 : 0;
i += (reg & 0xff00ff00) ? 8 : 0;
i += (reg & 0xffff0000) ? 16 : 0;
if (reg)
if (NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
and will try that with the above last MP code.
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2 Recipe found here.
Now with Pin(15):
Runs: 1003 tdif[0:3]: array('I', [27, 18, 16])
Deviations (us) lowest: [15, 15, 15, 15, 15, 15, 15, 15, 15, 15]
Deviations (us) highest: [20, 20, 20, 20, 20, 21, 21, 21, 21, 21]
and with Pin(0):
Runs: 1003 tdif[0:3]: array('I', [27, 18, 16])
Deviations (us) lowest: [15, 15, 15, 15, 15, 15, 15, 15, 15, 15]
Deviations (us) highest: [20, 20, 20, 20, 20, 20, 20, 20, 20, 21]
as well.
That's better.
// Test of divide by 5, from:
// https://embeddedgurus.com/stack-overflow/2009/06/division-of-integers-by-constants/
#include <stdio.h>
#include <stdint.h>
int main () {
int k = 0;
for (int i=0; i<20000000; i++)
if (i/5 != (uint32_t)(((uint64_t)i * 0xcccdu) >> 16) >> 2) {
printf ("%3d: %3d %3d\n", i, i/5, (uint32_t)(((uint64_t)i * 0xcccdu) >> 16) >> 2);
k += 1;
if (k>10)
break;
}
return 0;
}
does not work perfectly:
./divideby5
262144: 52428 52429
262149: 52429 52430
262154: 52430 52431
262159: 52431 52432
262164: 52432 52433
262169: 52433 52434
262174: 52434 52435
262179: 52435 52436
262184: 52436 52437
262189: 52437 52438
262194: 52438 52439
must think about it....
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
With the changed division getting ticks_us() takes between 599ns and 925ns, mean 682. Only drawback: it's a fixed number now.
There are only two versions of CPU clock: 80Mhz and 40Mhz. Same for the WDT?
The above division formula is not correct yet! But I'll find a correct version later.
Back at the desk: I confirm that with the above changes to the GPIO_IRQx handler the latency does not vary any more with the port number. For the B port, I see 5.6 to 6.8 µs, with an average of 5.86. For the A port it's slightly less, The time for taking the ticks_us() value is now almost precise 1µs. So the total average latency is 6-7 µs with rare events of 10µs.
At a gross figure, the us second times look plausible. The error you see would probably result in sometimes a value skipped or advanced by 2. The APB clock does not change when the CPU freq is changed. So 40 can reliably be assumed.
Nice result!
Perhaps we should use volatile in the GPIOA_IRQHandler
(and same with B) so that the ? : blocks cannot be jumped over.
inline uint32_t div5(uint32_t x) {
return (x*0xcccccccduL) >> 34;
}
gives a nice division by 5, which is correct for all 32 bit values (up to input x == 0xffffffffuL).
I tested with:
#include <stdio.h>
#include <stdint.h>
inline uint32_t div5(uint32_t x) {
return (x*0xcccccccduL) >> 34;
}
int main () {
int k = 0; // 4294967296 == 2^32
for (uint32_t i = 0; i < 4294967295; i++)
if (i/5 != div5(i)) {
printf ("%3u: %3u %3u\n", i, i/5, div5(i));
k += 1;
if (k>10)
break;
}
return 0;
}
that now can be composed to give the 64 bit division.
The time for taking the ticks_us() value is now almost precise 1µs.
What did you do to achieve that? Did you already solve the division by 40 problem?
Did you already solve the division by 40 problem?
I took the first suggestion for the test.
return (x*0xcccccccduL) >> 34;
That may be precise, but is not acceptable. The shift at the end must not be larger 22, otherwise the time range for ticks_ms() is ĺimited. ticks_ms() is calculated by dividing ticks_us() by 1000. That way the us and ms number are synchronous.
The larger shift will be acceptable because it shifts a 64-bit intermediate value back to 32 bit.
But the division in mp_hal_ticks_us64()
is a division of a 64 bit value. So there is still to consider how to extend the above way to 64 bits.
Meanwhile an almost cosmetic improvement:
void GPIOB_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS + TLS_IO_AB_OFFSET);
u8 i = (reg & 0xaaaaaaaa) ? WM_IO_PB_01 : WM_IO_PB_00;
i += (reg & 0xcccccccc) ? 2 : 0;
i += (reg & 0xf0f0f0f0) ? 4 : 0;
i += (reg & 0xff00ff00) ? 8 : 0;
i += (reg & 0xffff0000) ? 16 : 0;
if (reg)
if (NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
return (x*0xcccccccduL) >> 34;
That is acceptable for ticks_us(). Since it returns 30 significant bits, which is the range of ticks_ms(). But if that is divided by 1000 for ticks_ms(), it returns only 20 bits. So either a straight division by 40_000 is used for ticks_ms(), or a similar other multiply & shift method.
The call to ticks_us() takes not ~725 ns.
b.t.w.: mp_hal_ticks_us64 is only used inside mphalport.c
Edit: The output values of ticks_us() are indeed not reasonable, when using the above formula for it.
that now can be composed to give the 64 bit division.
This was not yet the case. The above code only demonstrates a (hopefully faster) correct division by 5 in 32 bits. The division in 64 bits has yet to be done and then there will be no loss of precision in the end.
Meanwhile it ocurred to me that the changed GPIOA/B_IRQHandler code gives a wrong result, if two interrupts are happening concurrently, which may be the case sometimes. The addition of the positions then gives a wrong i
and NULL == gpio_context[i].callback
so there the callback is not executed.
I have now something almost as good in mind, that will give a correct result, even in case there are simultaneous interrupts..
We can still use a timer channel for ticks_us(). That will return a value in almost 0 time, leaving the WDT counter for the watchdog.
Perhaps really not a bad idea.
Meanwhile I have this:
uint32_t div5_old(uint32_t x) {
return (x*0xcccccccduL) >> 34;
}
uint32_t div5(uint32_t t) {
uint32_t x = t & 0xffff; // lower 16 bits
uint32_t y = t >> 16;
uint32_t xc = x * 0xccccu;
uint32_t yc = y * 0x3333u;
uint32_t r = (xc+x) >> 16;
r += xc + y;
r >>= 2;
r += yc;
r >>= 16;
r += yc;
return r;
}
uint64_t div5_new(uint64_t t) {
uint64_t x = t & 0xffffffffull; // lower 32 bits
uint64_t y = t >> 32;
uint64_t xc = x * 0xccccccccull;
uint64_t yc = y * 0x33333333ull;
uint64_t r = (xc+x) >> 32;
r += xc + y;
r >>= 2;
r += yc;
r >>= 32;
r += yc;
return r;
}
The first two functions are identical, the third is an analog extension to 64 bits. Like the second avoids a 64 bit intermediate result, the third avoids a 128 bit intermediate result. The second takes ~3 times that of the first. The third therefore probably also is not optimal. But you may try it for the division by 5. So
uint64_t mp_hal_ticks_us64(void) {
uint64_t t = (ticks_total + (ticks_reload_value - tls_reg_read32(HR_WDG_CUR_VALUE))) >> 3; // divide by 8
uint64_t x = t & 0xffffffffull; // lower 32 bits
uint64_t y = t >> 32;
uint64_t xc = x * 0xccccccccull;
uint64_t yc = y * 0x33333333ull;
uint64_t r = (xc+x) >> 32;
r += xc + y;
r >>= 2;
r += yc;
r >>= 32;
r += yc;
return r;
}
should work, but is it faster?
Do we have uint128_t
on the platform?
Another idea: Brute force multiplying with 0xcccccccccccccccd (to get a possibly 128 bit intermediate) probably is not the right way.
Perhaps doing a step-wise multiplication with usage of the remainder is better.
No need to hurry.
Can you time/benchmark the latter function as you did with the original?
void GPIOA_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS) & 0xffff;
u8 i = 0;
if (!(reg & 0x00ff)) {
reg >>= 8; i += 8; }
if (!(reg & 0x0f)) {
reg >>= 4; i += 4; }
if (!(reg & 0x3)) {
reg >>= 2; i += 2; }
if (!(reg & 0x1)) {
reg >>= 1; i += 1; }
if (reg && NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
void GPIOB_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS + TLS_IO_AB_OFFSET);
u8 i = WM_IO_PB_00;
if (!(reg & 0xffff)) {
reg >>= 16; i += 16; }
if (!(reg & 0x00ff)) {
reg >>= 8; i += 8; }
if (!(reg & 0x0f)) {
reg >>= 4; i += 4; }
if (!(reg & 0x3)) {
reg >>= 2; i += 2; }
if (!(reg & 0x1)) {
reg >>= 1; i += 1; }
if (reg && NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
should also work with simultaneous interrupts.
Better: Simplified above code:
void GPIOA_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS) & 0xffff;
u8 i = 0;
if (reg & 0xff00) {
reg >>= 8; i += 8; }
if (reg & 0x00f0) {
reg >>= 4; i += 4; }
if (reg & 0x000c) {
reg >>= 2; i += 2; }
if (reg & 0x0002) {
reg >>= 1; i += 1; }
if (reg && NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
void GPIOB_IRQHandler(void)
{
u32 reg = tls_reg_read32(HR_GPIO_MIS + TLS_IO_AB_OFFSET);
u8 i = WM_IO_PB_00;
if (reg & 0xffff0000) {
reg >>= 16; i += 16; }
if (reg & 0xff00) {
reg >>= 8; i += 8; }
if (reg & 0x00f0) {
reg >>= 4; i += 4; }
if (reg & 0x000c) {
reg >>= 2; i += 2; }
if (reg & 0x0002) {
reg >>= 1; i += 1; }
if (reg && NULL != gpio_context[i].callback)
gpio_context[i].callback(gpio_context[i].arg);
return;
}
This and the improved mp_hal_ticks_us64()
(from above 9 hrs ago) gives substantial improvement:
Now: the script yields:
Runs: 1003 tdif[0:3]: array('I', [18, 6, 6])
Deviations (us) lowest: [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
Deviations (us) highest: [7, 7, 7, 7, 7, 7, 7, 7, 7, 7]
which is substantial faster and at the same time the variation is almost gone.
For the background: This was the MP program with a timer periodically (5ms) eliciting interrupt changing a pin, e.g. LED (the pin_start
callback) and that eliciting a hard pin interrupt (the pin_isr
):
@micropython.viper
def pin_start(t):
global t_start
ledp = ptr32(0x40010C00)
t_start = ticks_us()
ledp[0] = 0 # all pin values of bank A = low, was: led(0)
@micropython.viper
def pin_isr(p):
global t_stop, i_int
# t_stop = ticks_us()
t_stop = p.timestamp()
led(1)
i = int(i_int)
if i < N_Runs:
tdif[i] = ticks_diff(t_stop, t_start)
i += 1
i_int = i
I'm looking forward to see confirmations from your measurements. The question is now, can we subtract a constant amout of us from Pin.timestamp_us()? I suppose a value like 3 would be quite correct..?.
The division by 40 in mp_hal_ticks_us64()
might still seem a bit awkward, but I'm quite sure it is correct.
It is tested for all 32 bit values.
And the analogy to the 32-bit case should hold.
I will perhaps review it later, simplify / speed-up a bit and test more thoroughly.
should work, but is it faster?
The execution time of the third version is about 1.22µs. Much better than expected. There maybe the overhead to Pin toggling is large, which extends the measured time.
Do we have uint128_t on the platform?
I've never seen it.
I get this with active network connection:
Runs: 1003 tdif[0:3]: array('I', [18, 6, 8])
Deviations (us) lowest: [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
Deviations (us) highest: [10, 11, 11, 11, 11, 11, 11, 11, 12, 12]
That would still allow for ultrasonic distance measurement with millimeter precision.
The times I get: Execution time of ticks_us(): 1.22µs Latency of echo pulse to trigger: about 5.5 µs, with rare 10µs events, never less than 4.8 µs. Tested with PA0, PB15 and PB16.
So you subtract like 5µs. Or just ignore it. If it's about taking time differences, it does not matter.
So let's subtract 4. Then there is always a little positive latency, as expected. And someone naive can still mix the timestamp with ordinary tick_us values without disappointment.
How about other interrputs? Can the same be done for hard timer interrupt?
I'm running now a simple test for ticks_us() being monotonic:
import time
then = 0
while True:
now = time.ticks_us()
if now < then:
print(time.time(), "roll over", then, now)
then = now
It should roll over every 1074 seconds. So far it looks good. I'll let it run for the nex hours, while I have family business. What do you expect for the hard timer interrupt? There is no external event that can be timed. You can rely on the hard time to trigger at the set times. The variations you see is caused by IRQ handler response jitter. That will not go away, and unlike the time difference you have seen with Pin IRQ it has both positive and negative values. These should add up to 0.
The update code is uploaded. There is no subtract of x in the C code. That should be left to the Python code. When just looking to time differences of signals and signal slopes, it does not matter anyhow.
There was an irregularity in the above test of ticks_us(). Happend about 4 AM.
time() time() diff then now ticks() diff
1075 1073741817 13 20
2150 1075 1073741813 10 21
3225 1075 1073741815 11 20
4300 1075 1073741808 5 21
5375 1075 1073741804 1 21
................
24721 1074 1073741808 5 21
25796 1075 1073741804 1 21
26870 1074 1073741803 0 21
27944 1074 1073741810 7 21
28052 108 214748358 107374196 966367662
29019 967 1073741818 15 21
30093 1074 1073741816 13 21
31167 1074 1073741812 9 21
32242 1075 1073741820 17 21
33316 1074 1073741794 3 33
34390 1074 1073741812 9 21
35465 1075 1073741810 7 21
36539 1074 1073741817 14 21
37613 1074 1073741804 1 21
The two short periods add up to 1075 again. Not sure what happened. I will continue the test,
We continue here the discussions Compilation of w600 port and Compilation of w600 port II (closed now) of development/debugging the w60x port in MP branch w60x.