iNavFlight / inav

INAV: Navigation-enabled flight control software
https://inavflight.github.io
GNU General Public License v3.0
3.22k stars 1.5k forks source link

GPS Inconsistent behavior #431

Closed stronnag closed 8 years ago

stronnag commented 8 years ago

Spoiler : Somedays, adequate humber of satellites and reasonable HDOP doesn't meant nav functions will work as expected.

Today I flew 85fe245 and f440ead on my usually utterly reliable quad. This machine has not failed to execute PH, RTH or missions since we fixed SBAS months ago ---- until today.

Result. Completely random PH behaviour (if PH fails, then I don't even try any other nav function). Randomly, on hard reset, nav functions will either work or fail. Some log files at http://seyrsnys.myzen.co.uk/inav_ph_woes/. All the log files were created in a short time frame with good satellite coverage (16-19) and good HDOP (1.1 - 1.3).

... does some betaflight stuff on the minis for 30 minutes

Note 0. LOG00281 Index 1, PH attempt 4 is a good example of 'fly off' selection_473

Note 1. 281-283 were executed in quick succession, it is unlikely that there was some environmental change that affected the results. Note 2. If PH is not (obviously, visually working), it is rapidly aborted.

Exec Summary : nav post 43eaf10 seems pretty random to me at the moment. Other experiences solicited.

digitalentity commented 8 years ago

Seems like a GPS-related problem. GPS (and estimated navVel) velocity doesn't seem to be correlated with UAV attitude and stick input: image

It might also be a compass issue: At log 281 there is a moment when UAV is flying at only forward pitch - this is usually flying straight forward and GPS heading should be roughly equal to magnetic heading, however they are opposite: image

However, I doubt it's compass - UAV heading strongly correlates with yaw stick input. I'd blame GPS reception that reported invalid coordinates/velocities/course for some reason.

@stronnag was there strong wind that day, if so - what direction?

stronnag commented 8 years ago

@digitalentity, thanks for the analysis.

There was less than 8 m/s wind; I often fly in more. My initial thought (from the field) was that it was just a bad GPS day. I guess that happens from time to time. I did not suspect the compass, much of random flying / drifting around was manually correlating the observed heading with the LTM reported heading (which seemed consistent), whereas the mwp calculated 'range from home' often seemed incongruous.

It was an object lesson in 'try PH before engaging more advanced nav functions'.

I'll do some more missions today before closing.

digitalentity commented 8 years ago

@stronnag Interestingly GPS glitch detection did fire an alarm a few times. We definitely need better glitch detection logic.

stronnag commented 8 years ago

A day later .... updated the firmware to abf1015 (so no significant change). Starts with similarly less than stellar sat statistics (13-15 sats, 1.3-1.5 HDOP); couple of hours later its back to normal, 19-20 sats and 1.1 HDOP.

PH and other nav functions back to their awesome performance and consistency, even at the lower than normal end of the sat coverage range. Someone in the DoD really didn't like me yesterday.

Maybe we should use some of those spare X-FRAME bits for a glitch detection alert or counter?

digitalentity commented 8 years ago

@stronnag thanks for your report, indeed something must be wrong with GPS reception. We can use a byte in X-Frame to indicate INAV internal status flags, however we still don't have a good logic of detecting GPS failures...

Linjieqiang commented 8 years ago

The solution to the problem is using EKF algorithm although it looks messy and complex.

digitalentity commented 8 years ago

EKF is nothing close to a glitch detection and protection, it's merely an algorithm to blend data from available sensors.

Linjieqiang commented 8 years ago

If large difference between receiving from gps and predicting form EKF,program will move the data from gps.Maybe I'm wrong.

digitalentity commented 8 years ago

It's not the EKF itself, it's a supporting code logic that does the detection.

Linjieqiang commented 8 years ago

Oh.Sorry.It's my wrong.

xdigyx commented 8 years ago

Today I had a situation when at pos hold mode my hexacopter rapidly started to fly away. Not sure whether it was caused by loosing some sats or... Anyway maybe it would be a good idea either to take an average of 2-3 readings or ignore for pos hold mode position change grater than... (say 10cm/sample depending on model speed from the previous cycle/ significant sat.num loss)?

stronnag commented 8 years ago

GPS glitches are very rare, my report was a once a year occurrence, for a frequent flyer. Your problem is more likely symptomatic of mag interference. Have you verified that the mag works perfectly at all throttle settings?

xdigyx commented 8 years ago

Not sure what's telling you it's mag problem. If it was, then I believe the model would make slowly bigger and bigger circles, but it would not rapidly fly away. Do you agree? Although I have used pos hold max for 1h, it happened only once till now, apart this time pos hold works great. Unfortunately I had no BBox enabled...

digitalentity commented 8 years ago

Hard to diagnose w/o logs. Circles happen when heading is slightly wrong (up to maybe 30 deg) - this result in quad moving in slightly wrong direction. Bigger error will result in quad going in significantly wrong direction and a case of fly away.

xdigyx commented 8 years ago

There were no circles (for sure if it were then I would see it every time when using pos hold for a while). But there was just sudden pitch inclination increase by approx 30 deg. Now I know how to enable onboard dataflash bbox, so hope to catch it next time.

digitalentity commented 8 years ago

It's either a magnetic anomaly or compass failure, or real GPS glitch. If compass wiring is not good the sporadic bug in connectivity to compass chip may cause the chip to freeze and give out same heading over and over again. Please also check the wiring to the compass.

digitalentity commented 8 years ago

453

xdigyx commented 8 years ago

I am using onboard compass, so no wiring, can try to threat it with hot air but low chance. The FC is shielded (50x50mm 35um Cu PCB) from the bottom and grounded.

DzikuVx commented 8 years ago

@xdigyx shielding like that would not block magnetic field from power cables. It might shield oscillating magnetic field, but frequency would have to be > 100kHz. Interference from power cables is almost not oscillating, grounded Cu PCB or even full Faraday cage is ineffective. The only solution is to move power cables from battery/ESC further away from FC and (better option) use external magnetometer on a mast

xdigyx commented 8 years ago

Got your point, I was not thinking about this as the yaw heading reads were changing only 1-3deg depending on THR level. Firstly I will try to catch the event with blackbox to see what's the root cause and maybe discuss what can be done to avoid such situation . Then I will use an external mag.

stronnag commented 8 years ago

@digitalentity. I no longer think that this is an external GPS issue.

Today.

(all logs at http://seyrsnys.myzen.co.uk/inav_ph_woes/). There was perhaps 30 seconds in the land / power-cycle sequence. I find it hard to believe the satellite performance is varying in that short time period. The whole sequence described above was within 7 minutes, with consistently between 17-19 satellites and c. 1.1 HDOP.

That I can have nav functions randomly work / fail on power cycle within very short time periods looks to me like a firmware issue rather than a celestial GPS issue. I'm encouraged in this theory by your recent CC3D sensor woes.

martinbudden commented 8 years ago

I'm certainly willing to believe this is a firmware problem. The IO changes were pervasive, and although I was very careful, there is certainly a possibility that I've introduced a bug somewhere.

stronnag commented 8 years ago

It's all circumstantial at the moment, but definitely a regression since 2016-7-30, the last flawless nav experience (with this hardware).

I should add that so far the Dodo has behaved OK, whilst the SPRF3 has not (same firmware). Tomorrow, it's the Dodo.

xdigyx commented 8 years ago

Today I did catch same issue. For sure there was some yaw drift, but it seems like the model was all time trying to face the starting position. After approx 2 min from start num of sats dropped to 0 just for one read cycle and my model rapidly moved. I've got the log file. Some lines here: time (us) GPS_fixType GPS_numSat GPS_altitude GPS_speed (m/s) GPS_ground_course GPS_hdop GPS_eph GPS_epv 105125924 2 16 123 0.06 181.4 129 88 132 105326384 2 16 123 0.06 181.4 129 88 132 105526783 0 0 123 0.07 181.4 9999 95 132 105727183 2 16 124 0.07 181.4 129 95 132 105927635 2 16 124 17.76 171.4 131 95 132 106128086 2 16 124 0.56 171.3 131 95 132 106328488 2 16 124 0.11 171.3 131 95 132 106528932 2 16 124 0.31 127.9 131 94 132 106729408 2 16 124 0.59 22.1 148 94 132 106929832 2 16 124 0.86 359.3 148 94 132 107130275 2 16 124 0.86 359.3 116 94 132 107330739 2 16 124 0.83 356.9 122 94 132 107531103 2 16 124 0.82 357.4 119 94 131 107535112 2 16 124 0.82 357.4 119 94 131 107731575 2 16 124 0.65 355.4 119 94 131 107931951 2 16 124 0.44 355.3 122 93 131 108132427 2 16 124 0.34 357.9 122 93 130 108332876 2 16 124 0.20 0.2 129 93 130

The full log file I can send on email to whom it may concern (pm me).

Just for clarification: I am using ver 1.1.

digitalentity commented 8 years ago

@stronnag, I agree, this is very likely software issue. However I'm thinking it was there before the IO changes - I remember having odd mixer issues a while ago before that major change happened.

Which direction was your machine facing on powering up? Maybe something in firmware is messing up the mag...

digitalentity commented 8 years ago

This is going to be a tough one - changes to firmware will affect the memory layout and may seemingly "fix" the bug...

digitalentity commented 8 years ago

@stronnag I have a strong feeling that something is wrong with either IMU of magnetometer code.

From your latest logs: LOG 292 heading 31 - PH works

LOG 293 heading 22 - PH fails heading 28 - PH fails

LOG 294 heading 200 - PH works

Interestingly in LOG292 and LOG293 machine does the same (correct) tilt to corrent for error however actual correction differes. This can only happen when heading is incorrect.

Can you send exactly the same hex/dump file you've been using during these tests so I can check it on my SPRF3 board?

digitalentity commented 8 years ago

@xdigyx you likely experienced an ordinary GPS glitch, might be GPS wiring as well.

digitalentity commented 8 years ago

I have a feeling that recent CC3D woes and this issue are related...

xdigyx commented 8 years ago

Yes, I agree. Checked the wiring and no loosen wires or connectors. Anyway, glitch was not detected and caused model to fly away, so just a proof that better glitch detection is still needed. Thanks

stronnag commented 8 years ago

@digitalentity. I tend to do my PH tests with the craft 'beam on', so c. N / S heading is expected. If failed earlier on c. 200°, and I've had failures on other orientations. I (almost) always power up at 110° - 120° (facing away from 'my' bench), which is my first mag pre-flight check.

The hex (gcc 6.1 compiled) is at http://seyrsnys.myzen.co.uk/inav_ph_woes/inav_1.2.0_SPRACINGF3_8cfc74b.hex

From my records (I keep all my logs), it seems https://github.com/iNavFlight/inav/commit/43eaf10db2170633cf424baa71a6bf0082c8061f was the last non-affected build; I will fly that today.

oleost commented 8 years ago

Maybe an announcement on forum regarding this?

stronnag commented 8 years ago

Yes, I will do one immediately.

.... done

digitalentity commented 8 years ago

@stronnag, @oleost yes, thats a good idea, thanks!

@stronnag can you also provide a dump so I'll be testing exactly the same fimrware setup?

stronnag commented 8 years ago

cli dump http://seyrsnys.myzen.co.uk/inav_ph_woes/nav_sprf3_fp.txt

digitalentity commented 8 years ago

@stronnag thanks. Now I should try to find a GPS module...

EDIT: In the following days I'll try to reproduce the issue on my test-quad.

digitalentity commented 8 years ago

From my records (I keep all my logs), it seems 43eaf10 was the last non-affected build; I will fly that today.

@stronnag it would be awesome if you could pinpoint the commit that started to glitch. I suspect that this particular issue might be related to a buffer overflow or maybe a race condition in some interrupt handler since it's intermittent. I suspect UART - if an interrupt handler somehow gets invoked before buffers are properly initialised it might result in overwriting arbitrary memory (please correct me if i'm wrong).

stronnag commented 8 years ago

I'm going out this afternoon armed with all my big quad LIPOs and

inav_1.2.0_SPRACINGF3_43eaf10.hex
inav_1.2.0_SPRACINGF3_4bbd176.hex
inav_1.2.0_SPRACINGF3_8cfc74b.hex
inav_1.2.0_SPRACINGF3_f440ead.hex

I expect 43eaf10 to work and f440ead not to work, based on previous experience.

The great thing about having mag and acc saved in the dump file is that reflashing / return to known config in the field is so, so easy.

digitalentity commented 8 years ago

@stronnag thank you so much for testing this out! I'm struggling to figure out what's going on with CC3D - it's very repeatable glitch as I have a feeling it's related to GPS issues.

stronnag commented 8 years ago

27822a5 is the last reliable commit. 6+ power cycle / PH successful attempts. All later builds will fail PH with 2 attempts. more details & logs when I get home.

stronnag commented 8 years ago

Here's the full bisect test results. Methodology:

  1. stm32flash the required firmware
  2. use mwptools cf-cli tool to install known configuration
  3. Perform PH
  4. Land (auto if PH works)
  5. Powercycle
  6. repeat step 3 until PH FAIL or 6 consecuitve PASS

The results being:

Commit Date Result
43eaf10 2016-08-02 FAIL : LOG0295.TXT
742f429 2016-07-31 PASS : LOG0304.TXT
742f429 FAIL : LOG0305.TXT
0489eb8 2016-07-31 PASS : LOG0296.TXT
0489eb8 FAIL : LOG0297.TXT
27822a5 2016-07-29 PASS : LOG0298.TXT
27822a5 PASS : LOG0299.TXT
27822a5 PASS : LOG0300.TXT
27822a5 PASS : LOG0301.TXT
27822a5 PASS : LOG0302.TXT
27822a5 PASS : LOG0303.TXT

Several successful WP and RTH mission were subsequently flown with 27822a5.

All log files under http://seyrsnys.myzen.co.uk/inav_ph_woes/

I trust this is conclusive. It also supports the tentative conclusions relating this fault to the CC3D fault and the new IO subsystem.

xdigyx commented 8 years ago

stronnag, I have just compared my dump and yrs settings and noted one strange value: set mag_declination = -130, should it not be approx 110 POSITIVE?

stronnag commented 8 years ago

Not if (a) one habitually flies in the New Forest area of southern England, where the declination is c. 1° 30' W and (b) you have set inav_auto_mag_decl = ON set.

May I also suggest that this thread is for trying to fix a serious regression in iNav, not provide basic support to beginners; please use the RC Groups topic http://www.rcgroups.com/forums/showthread.php?t=2495732 for basic support.

digitalentity commented 8 years ago

@stronnag thanks for narrowing this down. There still is a possibility that in 27822a5 and earlier this bug is still present - there was a bug reported with frozen servos on SPRF3 which I also suspect to be a result of memory corruption. Coincidentally that bug also happened after beginning migration to new IO.

@martinbudden, @ledvinap if you have the time please have a look at the code to see if there's anything wrong with new IO code review in INAV.

digitalentity commented 8 years ago

We are lacking any protection from wrong indexes, incorrect pointers etc. I think we should start putting ASSERT(x) all over the code with three possible compile-time options:

  1. ignore (tested production code)
  2. record a line to a given location in memory
  3. same as 2 but with additional hardfault

Thoughs?

digitalentity commented 8 years ago

For one:

ioRec_t* IO_Rec(IO_t io)
{
    return io;
}

this assumes io is correct but what if it's not? then pointer to non-existent ioRec will be returned with a possibility of writing owner, resource and index to arbitrary memory localtion causing corruption of data.

Adding the following will help protect from this:

ASSERT(io);
ASSERT(io >= &ioRecs[0]);
ASSERT(io < &ioRecs[DEFIO_IO_USED_COUNT]);
xdigyx commented 8 years ago

Do not be so rude, I just wanted to help. Sometimes even basic things can be overlooked by pro users... 110 or 130, but I believe it should be positive (unless FW does not look at it at all and it really does not matter).

EDIT: checked again the location. it's negative....

digitalentity commented 8 years ago

@xdigyx I think @stronnag didn't ment to be rude, mag declination indeed varies greatly between different places. However this issue is related to latest 1.2 code and tests on 1.1 are probably not relevant, but they create noise that destract attention from the topic.

sambas commented 8 years ago

Tested current master (8cfc74b) with revo. Didn't have any notisable issues. 5 short flights (PH spin+ RTH) with power cycle between. Only three fitted in the bb log. Build with gcc-arm-none-eabi-4_9-2015q2-20150609-win32 using command "make -j24 TARGET=REVO OPBL=yes". Included map file if you could provide same with gcc 6 to compare. https://www.dropbox.com/sh/tucne4xbeanyqax/AABw0wqCcqdb0lyaBVCyD7Aha?dl=0

digitalentity commented 8 years ago

I don't have any GPS-related issues on my Y6 built with GCC-4.9.3 as well, although CC3D won't work at all...