Closed peterantypas closed 4 years ago
The main culprit is RFIC::readRSSI() which requires a blocking SPI transaction that takes 80us and there's nothing else that can be done about it. Furthermore, AIS being a "hard real time" application means that this logic cannot move to a task.
The total time budget per bit clock IRQ is 1/9600 = 104us.
I'm going to change the way RSSI is measured and evaluated for Clear Channel Assessment.
Only the Transceiver IC will do RSSI measurements, and only when it has a transmission pending. The NoiseFloorDetector and its associated events will be removed. This will eliminate calls to FreeRTOS xQueueSendFromISR() which I have clocked at over 50us and should restore RX packet yield to normal levels.
Another approach worth considering is characterizing noise floor RSSI for the unit and setting a threshold in firmware. Since this design is a self-contained unit that will ship with its own antennas, this is doable.
Noise floor probably varies a lot based on the local environment, so not sure if calibrating a unit is a feasible approach.
You can configure FRR_A to hold the current RSSI value, see FRR_CTL_A_MODE. The command FRR_A_READ uses a lot less time than GET_MODEM_STATUS. Besides transferring fewer bytes, it responds immediately without having to wait for CTS.
The EZRadioPRO also has a feature to automatically indicate RSSI level with a GPIO pin. You set the RSSI threshold for CCA to go high in MODEM_RSSI_THRESH. You'd still have to periodically measure the noise floor to determine a sensible threshold.
What I did with my receivers is move all SPI commands out of the bit clock ISR. When I need to read the RSSI or switch channels, I set a flag during the bit clock ISR and do the SPI transaction in the main loop. I didn't see any issues with missed bits while a SPI transaction is running - though no RTOS involved in my case.
Never mind about FRR_A, that's only for latched RSSI, i.e. not useful outside of receiving a message.
Yes, that's the issue. There is no faster path to read RSSI other than GET_MODEM_STATUS. With the receiver I can see why you don't have many real-time constraints, but with the transmitter I do: Transmission has to begin within a few microseconds of CCA or it will go over the slot boundary.
I agree that noise floor should be dynamically inferred, I just have to change how it's done so it does not involve pushing events from an ISR to the main event queue, because for some reason this takes 80us. I'm going to debug the RTOS itself to see how much of that is xQueueSendFromISR() vs my code. I suspect there's too much memcpy() going on ...
The culprit is FreeRTOS. I am going to go back to my original non-RTOS code with a slightly improved ring buffer and give it a try.
The bottleneck proved to be memcpy(). I transitioned back to object pools and queues of pointers and now things are much better. Still, the restarting of RX (which is necessary to take advantage of sync word detection), takes over 180us which is longer than a bit period. I'm going to revise the logic to perform this operation at the very end of the slot boundary where I can afford to drop 2 bits with few consequences. So if the receiver is not in the middle of a packet at bit 253, restart RX. Same thing for the transceiver at bit 0 of the next slot. We might miss the first bits of a preamble, but we can't detect those anyway; only the last 8 bits matter.
Looking further into SPI transactions, it seems that the bulk of time is spent waiting for CTS. After initialization, the longest command transmit time I've measured is 52us. The longest response time was 212us. This can absolutely be taken off the "fast path" and reduce time spent in ISR.
Ideally, though, we should never spend more than 52us in an ISR, so we can service the other IC's clock interrupt as well. This will require careful management but it's very doable.
Alright, the bare_metal branch seems very happy now:
There is a very rare case of an ISR clocking at 112us which I haven't been able to explain yet and probably won't bother. I'm going to live with this code for about a week and then merge to master.
I have also confirmed that TX is exactly inside the SOTDMA slot boundary (barely). To buy a little extra margin, the plan is to use a GPIO and the Si4463 hardware-based CCA mechanism in the next board design. This will eliminate 70us of RSSI interrogation in the fast path. There is nothing I can do about the fact that START_TX takes 400us. I have already dealt with this by reducing the number of ramp-up and ramp-down bits in the packet.
FWIW, I have seen all kinds of misaligned transmissions from other boats during the past few years while developing this. Not to mention several malfunctioning class A devices ...
Done. New firmware in master greatly improves RX yield.
There is too much going on in the Receiver::onBitClock() method, which can cause significant delay and loss of bits from the other IC. I confirmed this by removing RSSI measurement and observing an instant boost in message count.
Will need to re-architect firmware to perform a single GPIO read in the ISR and then queue the bit value for processing in thread mode (or RTOS task).