peterantypas / maiana

MAIANA™ is the first Open Source AIS transponder. It proudly raises an extra long middle finger to the marine electronics industry, government overregulation and everything else that gets in the way of innovation in this space.
GNU General Public License v3.0
343 stars 73 forks source link

Maiana transponder FW hardfaults due to GPS turning on, stackoverflow? #144

Open TimVosch opened 2 months ago

TimVosch commented 2 months ago

I merged several of my comments back into the OP

I've got three assembled PCBs which I built from the main branch at JLCPCB. Unfortunately all three are showing the same presumably firmware issue where a hardfault occurs or the MCU locks up.

The issue appears to be some kind of overflow or illegal memory execution happening because the PC register inconsistently points to non-sensible addresses. Which sometimes causes a lock up, or hardfault. See the details below for a GDB screenshot and the rare occasion where I get an ERROR output on the serial port. This issue happens only if the GPS unit is switched on. I expected some issue with UART buffering, but i've had no luck at pinpointing it exactly.

To clarify: it's not the turning on of the GPS that fails, it is something after it is turned on. I'm quite sure it is interrupt based. However, there are no consisten breakpoint hits in the BSP interrupts. Except for SysTick (makes sense), and RX CLK.

update: I've created a bare bones project in STM32CubeIDE as well as in RIOT-OS, both projects only initialize uart 1 and 2 and the GNSS_EN pin. After enabling the GNSS module, both projects still hard fault with a suspected PC corruption / stack overflow. This is somewhat surprising since the CubeIDE project uses blocking uart and nothing stack related except for the HAL... I tried doubling the stack to 2k but no luck. Sometimes there is no hardfault and the gnss nmea strings come through. But after a power cycle its back to faulting.

image

Details

The only component that I swapped is the mosfet at Q2 and Q7, being the mosfet to switch power to the GPS unit and the TX unit. I've validated that both of these work. This version of the board is set at rev 11.9.1, with the STM32L432KBU6. ``` [ERROR 2] # this is a hardfault [ERROR 5] # this is a memmanag(?) ``` ![image](https://github.com/user-attachments/assets/0a013f73-05ba-43af-8c9a-9f85359926b8) ![image](https://github.com/user-attachments/assets/6585e02e-a47d-4a32-8cfc-3bb5aa4f793a)

TimVosch commented 2 months ago

@peterantypas been working on this for a few days now. Would you happen to have an idea what could be the cause?

Have you tested the latest cad board with the L432?

peterantypas commented 2 months ago

I just pushed firmware to an L432 board. No issues. For reference my ARM toolchain is version 12.2.rel1

TimVosch commented 2 months ago

Thanks for the response!

Then it MUST be either my tool chain version or more likely my way of flashing the firmware. Event though its just OpenOCD.

I'll be back in the workshop Monday to test this!

Sincerely, Tim

On Sat, Sep 21, 2024, 17:48 Peter Antypas @.***> wrote:

I just pushed firmware to an L432 board. No issues. For reference my ARM toolchain is version 12.2.rel1

— Reply to this email directly, view it on GitHub https://github.com/peterantypas/maiana/issues/144#issuecomment-2365232782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWX4MRQQG47LJJG7GLGA5LZXWINTAVCNFSM6AAAAABODQOXEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRVGIZTENZYGI . You are receiving this because you authored the thread.Message ID: @.***>

TimVosch commented 2 months ago

Tried with 12.2rel1. No luck. It's down to either openocd, the swapped mosfet at Q2 and Q7 (DMP2045UQ-7) or some stupid mistake I'm making.... Probably the latter!

Would you mind uploading the latest elf/bin for L432? If possible without bootloader, just to eliminate an extra possible factor.


Now lets order an ST-Link. I still had a STM dev board around with integrated STLink V2. Unfortunately the issue persists! Must it be the mosfet?

It appears the hardfault occurs even before the GPS sent its first messag over uart. Just before it does, the RX line is glitching a bit.

Probably not relevant

![image](https://github.com/user-attachments/assets/2beb5f52-da3d-4adc-badd-efe6e2dfe4b3) ![image](https://github.com/user-attachments/assets/1e41fe01-c586-4ba0-a89d-c0354c89eaa7)


Just noticed it even happens when the UART2 isn't initialized :/


Step by step getting closer to the root cause. It appears to be related to using the OSC_IN and OSC_OUT pins as GPIO.

Looks very related to this issue: https://community.st.com/t5/stm32-mcus-products/stm32l4-hard-fault-during-osc32-out-init-as-gpo/td-p/174725


When using HSI as system clock, everything works fine! The aformentioned post might solve the issue. Then the question remains, why do I experience this, but peterantypas doesn't. Could it be the chip revision? I am working with rev Z. @peterantypas could you perhaps check your chip revision?


So the issue might be the mosfet! image

TimVosch commented 2 months ago

Solved

It was the MOSFET I swapped for a nearly identical one. However, this one consumed probably barely one milliampere more which causes issues on PC14 since PC14 is connected to the internal power circuitry. (See screenshot above).

I have now bypassed it to always have the GPS module active.

peterantypas commented 2 months ago

Thanks for the update. I may need to add some series resistor to that gate for version 12 of the board.