iNavFlight / inav

INAV: Navigation-enabled flight control software
https://inavflight.github.io
GNU General Public License v3.0
3.12k stars 1.47k forks source link

INAV 5.1 hangs, latch-ups on H743-WING V3 #8424

Closed rashied closed 10 months ago

rashied commented 2 years ago

Current Behavior

Recently purchase an H743-WING V3, flashed it to 5.1, configured it, and then soldered on components:

Primary bug: when powered on, FC will hang (latency) in control or latch (no Rx inputs register). This behavior occurs whether powered by battery or by USB-C. When powered by USB-C and connected to INAV, latch happens within ~1-10 seconds and is observable in Receiver window in Configurator. When connected to battery and not INAV, it behaves normally for ~1 minute before hangs occur and then ultimately a latch.

Steps to Reproduce

  1. Power on controller, FC such that Tx/Rx connect
  2. If USB connected (with or without battery), connect to INAV
  3. Go to Receiver tab
  4. Move controls and observe change in input on Receiver tab
  5. Wait for latchup (~1-10s)

Expected behavior

No hangs or latchups.

Suggested solution(s)

I'm unclear if this is a code issue or HW issue. Some other H743-WING V3 are reported work fine (MrD's video, response in Discord).

Additional context

b14ckyy commented 2 years ago

Do you have installed an SD card for blackbox? If yes please remove it and see if it still happens.

rashied commented 2 years ago

No SD card in any of these!

Matek suggested it could be me triggering the stick commands. I did another set of three test where I deflect sticks in a way that shouldn't trigger any commands. It failed all of them, but didn't record the first. In the second one below, you'll see that it latches just after powering on & connecting after a few moments.

Video 1 - hanging Video 2 - latch up

Left on, the board keeps beeping at random(?) intervals.

b14ckyy commented 2 years ago

OK this is not INAV. This is your ELRS losing link. INAV even switches into Failsafe mode. check your OSD with link quality and see if that goes down before the dropout happens. What ELRS module do you use? ELRS has ridiculous long duty cycles on older versions and even on newer versions if set up too aggressively (One reason why ELRS is not good for other people to fly with). Check if your module overheats.

rashied commented 2 years ago

I've tried two ELRS Rx, Matek & Betaflight on the most recent stable version (as of a few weeks ago) of ELRS. Both were working fine on F411-WTE and Foxeer Reaper AOI, respectively. Also, I didn't have any Rx loss issues on Ardupilot + H743 Wing V3. I tested that for a solid 10 min.

I haven't even hooked up my VTX - will do it and see what the OSD says and post a video later.

Another thing I haven't tried is using another UART. Any reason that would be different?

Also thanks for helping, this is really squirrely to me.

b14ckyy commented 2 years ago

I see that with all the cross checks it does not look like an ELRS issue at first glance but from the behavior, first sensor loss, then RC control loss and failsafe indicated, it is at least no lockup of INAV itself.

Yes another UART could be an option as well. Some UARTs have differences internally (I am not that deep into this to tell you what exactly) and can cause issues with certain devices. So worth a try.

We have seen other strange behavior of ELRS in the past as well. Like after a failsafe, the signal came back, Pilot had control for a few minutes and then out of nowhere all inputs froze. No failsafe was triggered as the log show, just all channels froze in place. We have seen this 3 times now in the INAV Fixed Wing Group with exact the same pattern, but it was all 2.x firmware as far as I can remember.

If the UART does not solve the issue I suggest to backup your config with DIFF ALL, then wipe the board with another flash and set up your RC link and telemetry only. Then see if it still happens. If not then piece by piece load back your config and see when it happens.

EhAye commented 2 years ago

H743-wing V2 (inav5), and Matek R24-D (elrs 3.0)

I have the same problem. flaps will randomly freeze up to a full second before resuming. Happens every 5 - 10 seconds. After a while, all channels stop responding entirely.

EdgeTX widgets report RSSI and LQ are functioning perfectly fine. inav configurator shows no movement on the receiver tab. No SDCard/blackbox disabled.

Also noticed telemetry slowly drops out. The first 9 telemetry items update rapidly and are always there. But GPS telemetry stops being received, and eventually yaw/roll/battery telemetry drops out too.

zvikaf commented 2 years ago

having similar issue with TBS system, H743-wing +INAV 5.1 + TBS micro, Connecting battery and USB to the computer, arming the system, when disarming the radio link stops, going into FS, rebooting the transmitter does not regain control. This radio halt, repeats on doing the same procedure.
Doing the same without the USB does not halt the radio .... The same FW was functioning well on INAV 4.1 and F722

b14ckyy commented 2 years ago

@zvikaf this is not similar, this is a completely different symptom. There where crossfire versions doing crap like that. Check your version and Update to 6.17 or 6.19. had something to do with CRSF V1 and V2 handling not working correctly,

zvikaf commented 2 years ago

@b14ckyy thanks for the quick and elaborated response. the TBS is updated to the latest version (6.19) , suspect that if it was CRSF issue, it would have surfaced on arming/disarming without USB connection as well. This is not random issue as this is repetitive. forgot to mention that have DJI FPV googles with telemetry, that shows FS mode

b14ckyy commented 2 years ago

@zvikaf I suggest that you open a separate issue for that though. We can collect information there until a dev can look at it. Also post a full DIFF and all the relevant info. If there is an issue, it is likely different from this ticket here.

zvikaf commented 2 years ago

@b14ckyy Thanks again.. will do :-)

OptimusTi commented 2 years ago

I have a V2 board that I can check with.

zvikaf commented 2 years ago

@OptimusTi thanks, this will be great, wonder if it is consistent.

b14ckyy commented 2 years ago

@zvikaf Sorry I had a brain fart yesterday and did not think of that one here: https://github.com/iNavFlight/inav/issues/8409 I think this is your actual issue. Devs are working on it.

zvikaf commented 2 years ago

@b14ckyy That's OK , happens to us all ... anyway, had checked now disabling autotrim ... according to @breadoven suggestion, and this eliminated the RC link freeze :-)

rashied commented 2 years ago

Hi folks, no video and still using 4.1.0 for now. I moved UARTs around and I still have hangs but no latching. RSSI never drops below 99 on OSD. I'm observing this with and without being connected to INAV Configurator 4.1.0.

Running out of ideas here. Maybe I should just try my luck and request a replacement from Matek?

b14ckyy commented 2 years ago

@rashied I see no hardware fault on the FC here. AS said before INAV is not freezing. It's the RC link that starts to become unstable. Cycle time, CPU load, MSP load, all stable and still updating. And INAV clearly shows a failsafe signal coming from the Receiver to the FC.

Can't tell why this only happens on the H743-Wing for you both but maybe open a issue ticket in the ELRS Github. Maybe downgrade to 2.5.1 or older and see if it happens there as well.

rashied commented 2 years ago

Yeah, I don't think it's the HW given that it was functioning on Ardupilot but I don't understand how it's the RC link. I'll just get a new FC when I have more time for FPV. Unfortunately, a lot seems out of stock and I'm looking for more UARTSs than the Matek F411-WTE.

Just so I understand what you mean about the RC link, can you explain a little more? I'll add an issue on the ELRS git if I can get on the same page as you.

What do you mean that RC link is becoming unstable? I don't have any drop in RSSI in the OSD and the ELRS Rx LED stays lit solid, so Tx/Rx seem to maintain pairing.

"Cycle time, CPU load, MSP load, all stable and still updating."

I don't know if this was the case, but I'd have to rewatch the videos.

"And INAV clearly shows a failsafe signal coming from the Receiver to the FC."

I don't know what you're referring to. Is this the beep?

Thanks again, @b14ckyy , I really appreciate it.

b14ckyy commented 1 year ago

"And INAV clearly shows a failsafe signal coming from the Receiver to the FC."

in the configurator below the battery Icon at the top, you have a small parachute symbol. This lights up red. This means that INAV either got a RX-Loss Data package reported by the receiver or INAV does not get any receiver signals at all. So the communitcation between receiver and flight controller is breaking down. But INAV is still working fine on its end as the PID loop and MSP communication is still active.

Someone would need to debug the Serial communication on the INAV side to tell you exactly what's going on.

gcmcnutt commented 1 year ago

@b14ckyy I have a similar issue with a new H743-Wing V3

What would be the best way to debug this? I've got STLink and Jtag debuggers. Not sure how to best hook it to this Matek -- do you have a reference?

MrD-RC commented 1 year ago

@gcmcnutt that is a different issue. It is because of the SD Card. There is an issue on SD Card lockup already.

gcmcnutt commented 1 year ago

@MrD-RC ahh, ok, any suggestions on how I can debug this? Or is there an issue filed already?

OptimusTi commented 1 year ago

@OptimusTi thanks, this will be great, wonder if it is consistent.

Sorry I haven't been able to look into this too busy

zvikaf commented 1 year ago

Looks like the H743 is problematic, might be some bugs in the kernel :-(

EhAye commented 1 year ago

might be some bugs in the kernel :-(

Is that fixable? or is that a phonecall to Matek with hopes they send me something else?

gcmcnutt commented 1 year ago

I'll give it an attempt to root cause this -- I could use some pointers on dev/debugger hookups for this device...

this is promising (drop sdio clock freq): https://github.com/PX4/PX4-Autopilot/issues/19155

zvikaf commented 1 year ago

WOW ... so indeed the kernel is problematic ... and the solution need to be done by ST :-(

EhAye commented 1 year ago

so indeed the kernel is problematic

I just saw this on Ardupilot's problems page... it has an H7 section: https://ardupilot.org/copter/docs/common-when-problems-arise.html#common-when-problems-arise

AutoPilots utilizing the H7 series of processors can, on rare occasions, get into a state where they will no longer complete initialization ... It is believed that this may be a memory corruption problem which can be caused by interrupting a flash memory write (as when changing parameters). Unfortunately, due to the processor’s architecture, there is no way in the firmware to correct this automatically

Is this something ST is likely to fix? Or am i just tossing pennies into a fountain? Will matek tell me 'better luck next time'? lol

b14ckyy commented 1 year ago

I just saw this on Ardupilot's problems page... it has an H7 section: https://ardupilot.org/copter/docs/common-when-problems-arise.html#common-when-problems-arise

This is a completely different topic. INAV does not write single parameters like Ardu. This was never a problem on INAV.

WOW ... so indeed the kernel is problematic ... and the solution need to be done by ST :-(

Why do you think that? as @gcmcnutt said this looks promising by just lowering the SIDO clock.

I think you guys mix up very different issues right now that have nothing to do with the OP.

MartinHugh commented 1 year ago

In case it is relevant to this bug report :`

I experienced a failsafe on two occasions yesterday as soon as I disarmed.

H743 + iNav5.1 + ELRS 3.0 (EP1) Continuously trim servos : Disabled. stats = OFF

Diff at : https://pastebin.com/Nn5Sx2CL

EDIT : No SD Card fitted

gcmcnutt commented 1 year ago

Yeah, I too have seen lockup that even for the saved config. On more than one occasion I have had a lockup after a disarm and 'save updated pids' action...

Anyway, I agree, there seem to be a couple of faults being conflated here. For now I will look at the blackbox issue, e.g. sdcard i/o reliability first -- and guessing this is something about the sdcard bus or transfer speed (I don't know enough about this code to understand the differences). If there is a better issue to connect this work to let me know.

Actions:

(sorry ramping up here)

gcmcnutt commented 1 year ago

Another data point:

So, maybe some evidence we need to limit the clock as was done on the other flight controller s/w.

MrD-RC commented 1 year ago

For reference. The SD card issue is at https://github.com/iNavFlight/inav/issues/7759. It not only effects H743.

MartinHugh commented 1 year ago

Another data point:

  • I measured the clock and data signals at the sdcard during init -- and yes a "HC I" card was showing > 25mhz. (if the system is following the spec it'd attempt to set clock to 50mhz)
  • I found a very old card (a model "HC") -- and the system boots up just fine -- and card is recognized.

So, maybe some evidence we need to limit the clock as was done on the other flight controller s/w.

Good data, but this was not a factor in my failsafes (see about 4 posts up) as I have no SD card fitted. Unless not having an SD card fitted itself causes issues.

gcmcnutt commented 1 year ago

@MrD-RC thanks -- i'll copy over the updated information.

Ktr128 commented 1 year ago

I have the same problem. Matek H743-WING V3, INAV 5.1. The elevon servos hung during autolaunch and the flying wing crashed. Although before that it took off 3 times perfectly. After that, I caught the elevons hovering while testing in my hand. Turned on the ANGLE mode and moved the flying wing in pitch. After about 30 seconds, the elevons hung for 1-2 seconds, after which they started working again. Has anyone solved this problem?

b14ckyy commented 1 year ago

@Ktr128 please read the whole post and not only the title. What you describe is different from the issue reported here as it only happens when Disarmed in this case.

To me this sounds like a Regulator brownout you had. Please open a post in the discussions sections as you might just need support on your setup. Don't forget to add any details possible about your setup. What plane, servos, FC etc etc.

Ktr128 commented 1 year ago

@b14ckyy thank you for the quick reply. I have read all messages #8424. I did not find in the description of the problem written by @rashied what happens after disarm. Maybe I'm stupid ))))) In particular, the situation is the same for @EhAye https://github.com/iNavFlight/inav/issues/8424#issuecomment-1269662600

Did you mean #8409 ?

If you recommend creating a new question with a detailed description, I will do it. Thanks in advance!

maxgunn19 commented 1 year ago

I am having the same problem.

I can start up but I cannot arm so my Motors won't spin (they spin in the outputs tab though). I can move the servos around and around for about 15 seconds without problems and then everything stops. A few seconds later the failsafe will be triggered and according to CLI I lose RX. Also in CLI my battery is reported as taking up 196% of my maxload and I cannot seem to resolve that. Any ideas would be helpful.

Matek H743 Wing V3 with ELRS RX & TX using INAV 5.1.0

Has anyone found a solution to this?

maxgunn19 commented 1 year ago

Unfortunately, I've given up on INAV for now with this setup. I changed over to ardupilot and haven't seen any problems yet. Best of luck!

breadoven commented 1 year ago

I am having the same problem.

I can start up but I cannot arm so my Motors won't spin (they spin in the outputs tab though). I can move the servos around and around for about 15 seconds without problems and then everything stops. A few seconds later the failsafe will be triggered and according to CLI I lose RX. Also in CLI my battery is reported as taking up 196% of my maxload and I cannot seem to resolve that. Any ideas would be helpful.

Matek H743 Wing V3 with ELRS RX & TX using INAV 5.1.0

Has anyone found a solution to this?

Do you know why it wouldn't arm, any status message in the CLI ? Moving the sticks around when disarmed is likely to trigger a stick command that will cause the problem mentioned in https://github.com/iNavFlight/inav/issues/8409, (although that doesn't fit with the arming issue you mention).