ryedwards / budgetcan-fw

Firmware to support gs_usb on most STM32 devices
MIT License
33 stars 8 forks source link

Unrecoverable Error from 2.5ms Dominant Pulse on Bus (canablev2 target) #12

Closed fred314159265 closed 4 months ago

fred314159265 commented 5 months ago

First of all, thank you to all those have contributed to this project, I am a fan!

I am using the canablev2 firmware build of this project and while it initially appears to work great, I am seeing that when there is a long (~2.5ms) dominant pulse on the bus from an erroneous node, the firmware appears to be acting a bit strange.

Steps

  1. Plug in CANable V2 interface with budgetcan_fw loaded.

  2. Bring up interface:

    • sudo ip link set dev can0 type can bitrate 500000 fd on dbitrate 2000000
    • sudo ip link set dev can0 up
  3. Check interface status with: ip -details link show can0

    19: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0  allmulti 0 minmtu 0 maxmtu 0 
    can <FD> state ERROR-ACTIVE restart-ms 0 
          bitrate 500000 sample-point 0.875
          tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1 brp 10
          gs_usb: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp_inc 1
          dbitrate 2000000 dsample-point 0.750
          dtq 25 dprop-seg 7 dphase-seg1 7 dphase-seg2 5 dsjw 2 dbrp 2
          gs_usb: dtseg1 1..16 dtseg2 1..8 dsjw 1..4 dbrp 1..1024 dbrp_inc 1
          clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus usb parentdev 3-2:1.0 
  4. Send frame repeatedly on bus:

    • watch -n 0.1 'cansend can0 123#0011223344556677'
  5. I confirm frames being send correctly with logic analyser on bus. (I am using another canablev2 to ACK the frames.)

  6. I use a erroneous node to produce a 2.5ms dominant pulse on the bus.

  7. Immediately after that the canablev2 fails to send any more frames, but the state show with ip -details link show can0 remains as state ERROR-ACTIVE.

  8. Checking the interface status with ip -details link show can0 shows no difference the above snippet.

  9. If I reset the interface using these commands, I get the error after the last one: RTNETLINK answers: No such device. I have confirmed that running these command before the issue has been seen works correctly - there is no error and the interface appears to work fine.

    • sudo ip link set dev can0 down
    • sudo ip link set dev can0 type can bitrate 500000 fd on dbitrate 2000000
    • sudo ip link set dev can0 up
  10. I have also tried resetting the USB device in software using what I believe us the same as USBDEVFS_RESET, it sends a further 10 frames before then failing to send anymore. After a further 10 cansend requests without any frames being sent, the command returns write: No buffer space available.

  11. If I reset the device by unplugging and re-plugging in, then the issue is fixed after bringing back up the interface, etc. (But comes back after the next erroneous 2.5ms pulse.)

ryedwards commented 5 months ago

I'll need to looking into recovery mechanisms currently implemented in the code. To be honest, my testing has only been happy-path up to this point. Usually "No buffer space..." is due to the lack of echo frames being returned via USB. I've seen this in the past where one of the queues is blocked and prevents messages in/out.

I will attempt generating a similar error mode with a debugger attached to try and root cause.

fred314159265 commented 5 months ago

Thanks for the reply 😀

I tried tweaking a couple of basic things blindly but without a debugger I was mostly stabbing in the dark and didn't get anywhere.

BTW if you want to create the 2.5ms pulse you can just set the TX pin of a CAN transceiver high, they usually have a internal dominant timeout which will get you the few ms long pulse - you don't need to generate a short pulse with firmware or anything like that

Thanks again, and good luck!

ryedwards commented 5 months ago

FYI - for debugging - you can purchase an ST-LINK v3 mini for around $12USD. I think the canablev2 has the debug pins exposed.

fred314159265 commented 5 months ago

Good point, I have had a look and I do have an old STlinkV2 I think should still work... 🤞

I was going to ask if you had any pointers on setting up debugging on the software side, but I noticed you have already tracked some VSCode config files, so I will see how far I get using them 😁

ryedwards commented 5 months ago

If you are using st-link and have those drivers installed (believe they come with the STM32 IDE) the only additional step is to add the cortex-debug vscode extension.

ryedwards commented 4 months ago

I now tried the exact commands you sent. I'm not sure if it's the HW/FW or something with the linux driver. As soon as I short the bus I see the same errors and when I try TX'ing messages from the other tool I am using I get error frames. Seems that shorting the bus is killing the CAN driver somewhere. I'll continue to dig.

ryedwards commented 4 months ago

OK - I think I found it. The FDCAN handler in STM32 does not have an auto recovery mechanism. It's up to the developer to handle resetting after a bus off event. I added some test code as detailed on this post: https://community.st.com/t5/stm32-mcus-products/stm32h7-fdcan-has-lost-the-automatic-bus-off-recovery-mechanism/td-p/187400

The RX'ing seems resolved but I'm still seeing "No buffer space available" and am seeing the "no such device".

ryedwards commented 4 months ago

I found the error: I was not writing the channel number to the error response. I created the temp variable for building the frame without writing in the channel, so, like I've yelled at developers in the past I returned whatever was on the stack.

Will create a bug and push to mainline.

Great work on finding this issue!

fred314159265 commented 4 months ago

You're a legend; thank you very much for the fix! 😁

ryedwards commented 4 months ago

Let me know if it solves the issue you've been having and if you find anything else!