meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.37k stars 824 forks source link

[Bug]: Too much traffic in an area causes hardware failure error #3869

Closed Vanguard4893 closed 4 months ago

Vanguard4893 commented 4 months ago

Category

Other

Hardware

Raspberry Pi Pico (W)

Firmware Version

2.3.6

Description

On the Pico platform, I am encountering the following error:

WARN | ??:??:?? 83 [RadioIf] Can not send yet, busyTx ERROR | ??:??:?? 83 [RadioIf] Hardware Failure! busyTx for more than 60s ERROR | ??:??:?? 83 [RadioIf] NOTE! Recording critical error 8 at src/mesh/RadioLibInterface.cpp:94 INFO | ??:??:?? 83 Rebooting

This has occurred under the condition where there are more than 6 nodes active in the same room (I encountered this while building and testing hardware).

Removing some active nodes from the vicinity permitted every node board that had failed in this manner to then operate perfectly fine.

It seems like the code assumes that if the channel is busy for >60 seconds upon boot (before it can get it's first packet out perhaps), then the radio has failed and the node goes into a reboot loop.

I hope someone can assist with this issue.

Relevant log output

WARN  | ??:??:?? 83 [RadioIf] Can not send yet, busyTx
ERROR | ??:??:?? 83 [RadioIf] Hardware Failure! busyTx for more than 60s
ERROR | ??:??:?? 83 [RadioIf] NOTE! Recording critical error 8 at src/mesh/RadioLibInterface.cpp:94
INFO  | ??:??:?? 83 Rebooting
GUVWAF commented 4 months ago

Is this a Pico with Waveshare SX1262 hat or a DIY target?

Which modem preset (e.g. LongFast, LongSlow) are you using?

Vanguard4893 commented 4 months ago

It's a DIY target - but electrically the same as the Waveshare (SX1262 radio, and the same firmware).

Channel setting is LongFast

caveman99 commented 4 months ago

Busy TX means it's waiting for the interrupt to return. Do you have DIO1 wired up correctly? I ony ever saw that when incorrect pin definitions were used on a radio.

No way in hell can a LongFast TX last more than 60 seconds, the assumption by the firmware that something is fishy is 100% spot on in this case.

Vanguard4893 commented 4 months ago

It's correct - As I mentioned in the opening comment, I was able to produce this with multiple nodes in the same room, but as soon as they were tested individually, the problem went away with no hardware changed.

I initially did think the same though, that a radio module was faulty, until more nodes started doing it right after power up.

I agree that the channel actually being busy that long is suspicious though!

GUVWAF commented 4 months ago

It's not the channel, the device really thinks the radio itself is transmitting continuously for more than 60 seconds. This really suggests a wiring issue.

Not sure why it happens as soon as you add more devices, but this might just be a coincidence.

Vanguard4893 commented 4 months ago

The wiring is exactly as per the Waveshare pinning to the SX1262, with DIO1 to GPIO20 on the PicoW.

Spectrum analyser shows the radio itself is not in transmit - no 869.25MHz peak is visible on a direct connection to the antenna port.

Can't see how it's a coincidence considering I was able to replicate it reliably more than once, if I wasn't, I would fully agree with your logic.

10 other DIY nodes of the exact same layout in other separate locations aren't showing the problem and they've been in service for 2 months at this point.

Cheers

GUVWAF commented 4 months ago

No, it's not actually transmitting, but the device missed the Tx done interrupt. The failure has nothing to do with how busy the channel is.

Did you design a PCB or is it just wired up with dupont wires? Can it be there is crosstalk?

Vanguard4893 commented 4 months ago

I see I designed a PCB, to electronically match the layout of the Waveshare module exactly so the firmware would be directly compatible.

Unlikely to be any crosstalk on the PCB, given my day job is as an EMC Test Engineer & EMC Consultant, so I have extensive experience in preventing interference on PCB designs.

Breadboard wiring with jumpers would be a nightmare for noise pickup on a fast SPI bus!

caveman99 commented 4 months ago

can you check the activity on the i/o lines apart from the spi bus? there's something going on in your hardware i'm afraid.