Open nawbar23 opened 2 years ago
This is an issue I am also having if many devices (i.e 5 devices) are transmitting every 10 seconds.
I've been having a similar problem. I'm working on getting RFM69 working on a RPi Pico and have the pico communicating with a 'grown up RPi. Getting a hang up when an incoming message requires an ACK to be sent while an outgoing message is being sent.
2023-07-05 20:02:08,168 - RFM69.radio - DEBUG - 1984828480 - Interrupt (rx)Incoming Message 2023-07-05 20:02:08,169 - RFM69.radio - DEBUG - 1984828480 - Interrupt -have intLock (rx)Acquires intLock Sending (tx)Send Message start 2023-07-05 20:02:08,170 - RFM69.radio - DEBUG - 1995557248 - _Send entry (tx)in _send routine 2023-07-05 20:02:08,171 - RFM69.radio - DEBUG - 1984828480 - replying to ack request (rx)Msg read, req Ack 2023-07-05 20:02:08,171 - RFM69.radio - DEBUG - 1995557248 - _Send have sendLock (tx)Acquires sendLock 2023-07-05 20:02:08,171 - RFM69.radio - DEBUG - 1984828480 - Incoming data packet (rx)From interrupt routine 2023-07-05 20:02:08,172 - RFM69.radio - DEBUG - 1995557248 - _Send checking canSend (tx)calling _canSend 2023-07-05 20:02:08,173 - RFM69.radio - DEBUG - 1995557248 - _canSend entry caller:send mode:1 (tx)in _canSend 2023-07-05 20:02:08,173 - RFM69.radio - DEBUG - 1995557248 - begin_receive - entry (tx)call to begin_receive but stalls as can't acquire intLock 2023-07-05 20:02:08,183 - RFM69.radio - DEBUG - 1984828480 - Sending an ack (rx)Sending ack 2023-07-05 20:02:08,183 - RFM69.radio - DEBUG - 1984828480 - Interrupt (ack)- released intLock (rx)release intLock
line 637 of current build
if ack_requested and self.auto_acknowledge:
self._debug("Sending an ack")
**self._intLock.release()**
self.send_ack(sender_id)
self.begin_receive()
return
2023-07-05 20:02:08,184 - RFM69.radio - DEBUG - 1995557248 - begin_receive - have intLock (tx)gets released intLock
2023-07-05 20:02:08,184 - RFM69.radio - DEBUG - 1984828480 - Send_ack entry (rx)in Send_ack
2023-07-05 20:02:08,185 - RFM69.radio - DEBUG - 1984828480 - Send_ack have sendLock (rx)acquires sendLock
2023-07-05 20:02:08,186 - RFM69.radio - DEBUG - 1984828480 - _canSend entry caller:sendAck mode:1 (rx)in _canSend
2023-07-05 20:02:08,186 - RFM69.radio - DEBUG - 1984828480 - begin_receive - entry (rx)call to begin_receive
but stalls
as can't acquire intLock
So now we have a stalemate. The Rx process has sendLock but the Tx process has stolen the intLock when it got released so neither can proceed.
The solution, I think.
begin_receive has 'with self._intLock' but it does not need intLock. Its only reading the registers and intLock does not prevent these getting changed as this is happening in the RFM69 this lock can be removed. the sendLock prevents the interference between the two outgoing messages.
So 1) Remove this with clause and (2) move the intLock.release in the ACK clause to after the self.send_ack so the interrupt retains the intLock until Ack is sent resolves this mutex
I actually can't see why _canSend needs to call begin_receive anyway. Removing this from _canSend may be cleaner.
Seems to be robust after this but I'm still testing.
Not needed for this fix but the SendLock.NotifyAll at the start of the interrupt handler is not needed. SendLock never calls a wait or wait_for so it can be downgraded from a Condition to a simple Lock and this can be removed.
Thank you so much for the detailed analysis! Unfortunately I won't have the time to work on this myself in the near future (since this is just a hobby project and some other life priorities have arisen), but if you'd like to submit a pull request I'd be very happy to accept it.
When automatic ACK mode is turned on, _sendFrame is called from _interruptHandler. _sendFrame than contains:
with self._sendLock:
....self._setMode(RF69_MODE_TX)
....self._sendLock.wait(1.0)
self._setMode(RF69_MODE_RX)
which causes interrupt to hang for 1s and it is not possible to release lock as SEND_DONE interrupt is received only when wait expires.Note time passed between 'Set mode to: 4' - RF69_MODE_TX and 'Set mode to: 3' - RF69_MODE_RX; Note also that the second interrupt is handled just after previous is finished - after waiting 1s for sending and getting into reception mode (twice :-));