Open mdevaev opened 3 years ago
Another possible workaround is busyloop (pseudo-c):
long double deadline_ts = get_now_monotonic() + timeout;
VCOS_STATUS_T sem_status;
while (true) {
sem_status = vcos_semaphore_trywait(sem);
if (sem_status != VCOS_EAGAIN || get_now_monotonic() > deadline_ts) {
return sem_status;
}
usleep(1000);
}
The obvious problem is usleep()
and a large number of unnecessary calls of vcos_semaphore_trywait()
(i.e. sem_trywait()
).
There is a consistent repro for this bug during boot. The Pi doesn't have a built-in clock, so when you shut it off, its clock will lag behind real-world time until the Pi syncs with NTP during boot.
If you run an application that calls vcos_semaphore_wait_timeout
during boot, there's a race condition that makes it easy to trigger this issue. For example, imagine the following sequence:
vcos_semaphore_wait_timeout
with a 30 second timeoutVCOS_EAGAIN
from vcos_semaphore_wait_timeout
even though 30 seconds have not yet elapsedThis bug can also trigger at any other time that the Pi adjusts its clock forward or backwards, but it's easiest to trigger during boot, as there's generally a significant time jump forward for the time that the Pi has been powered off.
Describe the bug
vcos_semaphore_wait_timeout()
usesCLOCK_REALTIME
andsem_timedwait()
. If the time was adjusted duringvcos_semaphore_wait_timeout()
(via NTP for example) then it will either wait longer than the specified timeout (if the clock is moved back), or it will not wait for it (if forward). When usingsem_timedwait()
this is a known problem (1, 2). Forsem_timedwait()
on Linux, this is the expected behavior (although in QNX there is asem_timedwait_monotonic()
that usesCLOCK_MONOTONIC
) since it accepts an absolute timestamp. Forvcos_semaphore_wait_timeout()
, I think this behavior is incorrect, because the timeout value must be set to a relative value.As a fix, I could use a check on my side: compare monotonic timestamps before and after calling
vcos_semaphore_wait_timeout()
and run this again if the timeout was not reached. But this solution will not help if the clock has been moved to the past and the wait may be increased.To reproduce The problem is very rare and I don't have a case for reproducing it. But I think my analysis seems to point to the problem fairly accurately. I encountered this bug when using ustreamer. It uses a semaphore when encoding via OMX.
Expected behaviour The timeout should not be affected by the clock adjusting.
Actual behaviour Adjusting the clock affects the timeout.