Closed stronnag closed 8 years ago
Seems like a GPS-related problem. GPS (and estimated navVel) velocity doesn't seem to be correlated with UAV attitude and stick input:
It might also be a compass issue: At log 281 there is a moment when UAV is flying at only forward pitch - this is usually flying straight forward and GPS heading should be roughly equal to magnetic heading, however they are opposite:
However, I doubt it's compass - UAV heading strongly correlates with yaw stick input. I'd blame GPS reception that reported invalid coordinates/velocities/course for some reason.
@stronnag was there strong wind that day, if so - what direction?
@digitalentity, thanks for the analysis.
There was less than 8 m/s wind; I often fly in more. My initial thought (from the field) was that it was just a bad GPS day. I guess that happens from time to time. I did not suspect the compass, much of random flying / drifting around was manually correlating the observed heading with the LTM reported heading (which seemed consistent), whereas the mwp calculated 'range from home' often seemed incongruous.
It was an object lesson in 'try PH before engaging more advanced nav functions'.
I'll do some more missions today before closing.
@stronnag Interestingly GPS glitch detection did fire an alarm a few times. We definitely need better glitch detection logic.
A day later .... updated the firmware to abf1015 (so no significant change). Starts with similarly less than stellar sat statistics (13-15 sats, 1.3-1.5 HDOP); couple of hours later its back to normal, 19-20 sats and 1.1 HDOP.
PH and other nav functions back to their awesome performance and consistency, even at the lower than normal end of the sat coverage range. Someone in the DoD really didn't like me yesterday.
Maybe we should use some of those spare X-FRAME bits for a glitch detection alert or counter?
@stronnag thanks for your report, indeed something must be wrong with GPS reception. We can use a byte in X-Frame to indicate INAV internal status flags, however we still don't have a good logic of detecting GPS failures...
The solution to the problem is using EKF algorithm although it looks messy and complex.
EKF is nothing close to a glitch detection and protection, it's merely an algorithm to blend data from available sensors.
If large difference between receiving from gps and predicting form EKF,program will move the data from gps.Maybe I'm wrong.
It's not the EKF itself, it's a supporting code logic that does the detection.
Oh.Sorry.It's my wrong.
Today I had a situation when at pos hold mode my hexacopter rapidly started to fly away. Not sure whether it was caused by loosing some sats or... Anyway maybe it would be a good idea either to take an average of 2-3 readings or ignore for pos hold mode position change grater than... (say 10cm/sample depending on model speed from the previous cycle/ significant sat.num loss)?
GPS glitches are very rare, my report was a once a year occurrence, for a frequent flyer. Your problem is more likely symptomatic of mag interference. Have you verified that the mag works perfectly at all throttle settings?
Not sure what's telling you it's mag problem. If it was, then I believe the model would make slowly bigger and bigger circles, but it would not rapidly fly away. Do you agree? Although I have used pos hold max for 1h, it happened only once till now, apart this time pos hold works great. Unfortunately I had no BBox enabled...
Hard to diagnose w/o logs. Circles happen when heading is slightly wrong (up to maybe 30 deg) - this result in quad moving in slightly wrong direction. Bigger error will result in quad going in significantly wrong direction and a case of fly away.
There were no circles (for sure if it were then I would see it every time when using pos hold for a while). But there was just sudden pitch inclination increase by approx 30 deg. Now I know how to enable onboard dataflash bbox, so hope to catch it next time.
It's either a magnetic anomaly or compass failure, or real GPS glitch. If compass wiring is not good the sporadic bug in connectivity to compass chip may cause the chip to freeze and give out same heading over and over again. Please also check the wiring to the compass.
I am using onboard compass, so no wiring, can try to threat it with hot air but low chance. The FC is shielded (50x50mm 35um Cu PCB) from the bottom and grounded.
@xdigyx shielding like that would not block magnetic field from power cables. It might shield oscillating magnetic field, but frequency would have to be > 100kHz. Interference from power cables is almost not oscillating, grounded Cu PCB or even full Faraday cage is ineffective. The only solution is to move power cables from battery/ESC further away from FC and (better option) use external magnetometer on a mast
Got your point, I was not thinking about this as the yaw heading reads were changing only 1-3deg depending on THR level. Firstly I will try to catch the event with blackbox to see what's the root cause and maybe discuss what can be done to avoid such situation . Then I will use an external mag.
@digitalentity. I no longer think that this is an external GPS issue.
Today.
(all logs at http://seyrsnys.myzen.co.uk/inav_ph_woes/). There was perhaps 30 seconds in the land / power-cycle sequence. I find it hard to believe the satellite performance is varying in that short time period. The whole sequence described above was within 7 minutes, with consistently between 17-19 satellites and c. 1.1 HDOP.
That I can have nav functions randomly work / fail on power cycle within very short time periods looks to me like a firmware issue rather than a celestial GPS issue. I'm encouraged in this theory by your recent CC3D sensor woes.
I'm certainly willing to believe this is a firmware problem. The IO changes were pervasive, and although I was very careful, there is certainly a possibility that I've introduced a bug somewhere.
It's all circumstantial at the moment, but definitely a regression since 2016-7-30, the last flawless nav experience (with this hardware).
I should add that so far the Dodo has behaved OK, whilst the SPRF3 has not (same firmware). Tomorrow, it's the Dodo.
Today I did catch same issue. For sure there was some yaw drift, but it seems like the model was all time trying to face the starting position. After approx 2 min from start num of sats dropped to 0 just for one read cycle and my model rapidly moved. I've got the log file. Some lines here: time (us) GPS_fixType GPS_numSat GPS_altitude GPS_speed (m/s) GPS_ground_course GPS_hdop GPS_eph GPS_epv 105125924 2 16 123 0.06 181.4 129 88 132 105326384 2 16 123 0.06 181.4 129 88 132 105526783 0 0 123 0.07 181.4 9999 95 132 105727183 2 16 124 0.07 181.4 129 95 132 105927635 2 16 124 17.76 171.4 131 95 132 106128086 2 16 124 0.56 171.3 131 95 132 106328488 2 16 124 0.11 171.3 131 95 132 106528932 2 16 124 0.31 127.9 131 94 132 106729408 2 16 124 0.59 22.1 148 94 132 106929832 2 16 124 0.86 359.3 148 94 132 107130275 2 16 124 0.86 359.3 116 94 132 107330739 2 16 124 0.83 356.9 122 94 132 107531103 2 16 124 0.82 357.4 119 94 131 107535112 2 16 124 0.82 357.4 119 94 131 107731575 2 16 124 0.65 355.4 119 94 131 107931951 2 16 124 0.44 355.3 122 93 131 108132427 2 16 124 0.34 357.9 122 93 130 108332876 2 16 124 0.20 0.2 129 93 130
The full log file I can send on email to whom it may concern (pm me).
Just for clarification: I am using ver 1.1.
@stronnag, I agree, this is very likely software issue. However I'm thinking it was there before the IO changes - I remember having odd mixer issues a while ago before that major change happened.
Which direction was your machine facing on powering up? Maybe something in firmware is messing up the mag...
This is going to be a tough one - changes to firmware will affect the memory layout and may seemingly "fix" the bug...
@stronnag I have a strong feeling that something is wrong with either IMU of magnetometer code.
From your latest logs: LOG 292 heading 31 - PH works
LOG 293 heading 22 - PH fails heading 28 - PH fails
LOG 294 heading 200 - PH works
Interestingly in LOG292 and LOG293 machine does the same (correct) tilt to corrent for error however actual correction differes. This can only happen when heading is incorrect.
Can you send exactly the same hex/dump file you've been using during these tests so I can check it on my SPRF3 board?
@xdigyx you likely experienced an ordinary GPS glitch, might be GPS wiring as well.
I have a feeling that recent CC3D woes and this issue are related...
Yes, I agree. Checked the wiring and no loosen wires or connectors. Anyway, glitch was not detected and caused model to fly away, so just a proof that better glitch detection is still needed. Thanks
@digitalentity. I tend to do my PH tests with the craft 'beam on', so c. N / S heading is expected. If failed earlier on c. 200°, and I've had failures on other orientations. I (almost) always power up at 110° - 120° (facing away from 'my' bench), which is my first mag pre-flight check.
The hex (gcc 6.1 compiled) is at http://seyrsnys.myzen.co.uk/inav_ph_woes/inav_1.2.0_SPRACINGF3_8cfc74b.hex
From my records (I keep all my logs), it seems https://github.com/iNavFlight/inav/commit/43eaf10db2170633cf424baa71a6bf0082c8061f was the last non-affected build; I will fly that today.
Maybe an announcement on forum regarding this?
Yes, I will do one immediately.
.... done
@stronnag, @oleost yes, thats a good idea, thanks!
@stronnag can you also provide a dump so I'll be testing exactly the same fimrware setup?
@stronnag thanks. Now I should try to find a GPS module...
EDIT: In the following days I'll try to reproduce the issue on my test-quad.
From my records (I keep all my logs), it seems 43eaf10 was the last non-affected build; I will fly that today.
@stronnag it would be awesome if you could pinpoint the commit that started to glitch. I suspect that this particular issue might be related to a buffer overflow or maybe a race condition in some interrupt handler since it's intermittent. I suspect UART - if an interrupt handler somehow gets invoked before buffers are properly initialised it might result in overwriting arbitrary memory (please correct me if i'm wrong).
I'm going out this afternoon armed with all my big quad LIPOs and
inav_1.2.0_SPRACINGF3_43eaf10.hex
inav_1.2.0_SPRACINGF3_4bbd176.hex
inav_1.2.0_SPRACINGF3_8cfc74b.hex
inav_1.2.0_SPRACINGF3_f440ead.hex
I expect 43eaf10 to work and f440ead not to work, based on previous experience.
The great thing about having mag and acc saved in the dump file is that reflashing / return to known config in the field is so, so easy.
@stronnag thank you so much for testing this out! I'm struggling to figure out what's going on with CC3D - it's very repeatable glitch as I have a feeling it's related to GPS issues.
27822a5 is the last reliable commit. 6+ power cycle / PH successful attempts. All later builds will fail PH with 2 attempts. more details & logs when I get home.
Here's the full bisect test results. Methodology:
The results being:
Commit | Date | Result |
---|---|---|
43eaf10 | 2016-08-02 | FAIL : LOG0295.TXT |
742f429 | 2016-07-31 | PASS : LOG0304.TXT |
742f429 | FAIL : LOG0305.TXT | |
0489eb8 | 2016-07-31 | PASS : LOG0296.TXT |
0489eb8 | FAIL : LOG0297.TXT | |
27822a5 | 2016-07-29 | PASS : LOG0298.TXT |
27822a5 | PASS : LOG0299.TXT | |
27822a5 | PASS : LOG0300.TXT | |
27822a5 | PASS : LOG0301.TXT | |
27822a5 | PASS : LOG0302.TXT | |
27822a5 | PASS : LOG0303.TXT |
Several successful WP and RTH mission were subsequently flown with 27822a5.
All log files under http://seyrsnys.myzen.co.uk/inav_ph_woes/
I trust this is conclusive. It also supports the tentative conclusions relating this fault to the CC3D fault and the new IO subsystem.
stronnag, I have just compared my dump and yrs settings and noted one strange value: set mag_declination = -130, should it not be approx 110 POSITIVE?
Not if (a) one habitually flies in the New Forest area of southern England, where the declination is c. 1° 30' W and (b) you have set inav_auto_mag_decl = ON
set.
May I also suggest that this thread is for trying to fix a serious regression in iNav, not provide basic support to beginners; please use the RC Groups topic http://www.rcgroups.com/forums/showthread.php?t=2495732 for basic support.
@stronnag thanks for narrowing this down. There still is a possibility that in 27822a5 and earlier this bug is still present - there was a bug reported with frozen servos on SPRF3 which I also suspect to be a result of memory corruption. Coincidentally that bug also happened after beginning migration to new IO.
@martinbudden, @ledvinap if you have the time please have a look at the code to see if there's anything wrong with new IO code review in INAV.
We are lacking any protection from wrong indexes, incorrect pointers etc. I think we should start putting ASSERT(x)
all over the code with three possible compile-time options:
Thoughs?
For one:
ioRec_t* IO_Rec(IO_t io)
{
return io;
}
this assumes io
is correct but what if it's not? then pointer to non-existent ioRec
will be returned with a possibility of writing owner
, resource
and index
to arbitrary memory localtion causing corruption of data.
Adding the following will help protect from this:
ASSERT(io);
ASSERT(io >= &ioRecs[0]);
ASSERT(io < &ioRecs[DEFIO_IO_USED_COUNT]);
Do not be so rude, I just wanted to help. Sometimes even basic things can be overlooked by pro users... 110 or 130, but I believe it should be positive (unless FW does not look at it at all and it really does not matter).
EDIT: checked again the location. it's negative....
@xdigyx I think @stronnag didn't ment to be rude, mag declination indeed varies greatly between different places. However this issue is related to latest 1.2 code and tests on 1.1 are probably not relevant, but they create noise that destract attention from the topic.
Tested current master (8cfc74b) with revo. Didn't have any notisable issues. 5 short flights (PH spin+ RTH) with power cycle between. Only three fitted in the bb log. Build with gcc-arm-none-eabi-4_9-2015q2-20150609-win32 using command "make -j24 TARGET=REVO OPBL=yes". Included map file if you could provide same with gcc 6 to compare. https://www.dropbox.com/sh/tucne4xbeanyqax/AABw0wqCcqdb0lyaBVCyD7Aha?dl=0
I don't have any GPS-related issues on my Y6 built with GCC-4.9.3 as well, although CC3D won't work at all...
Spoiler : Somedays, adequate humber of satellites and reasonable HDOP doesn't meant nav functions will work as expected.
Today I flew 85fe245 and f440ead on my usually utterly reliable quad. This machine has not failed to execute PH, RTH or missions since we fixed SBAS months ago ---- until today.
Result. Completely random PH behaviour (if PH fails, then I don't even try any other nav function). Randomly, on hard reset, nav functions will either work or fail. Some log files at http://seyrsnys.myzen.co.uk/inav_ph_woes/. All the log files were created in a short time frame with good satellite coverage (16-19) and good HDOP (1.1 - 1.3).
... does some betaflight stuff on the minis for 30 minutes
Note 0. LOG00281 Index 1, PH attempt 4 is a good example of 'fly off'
Note 1. 281-283 were executed in quick succession, it is unlikely that there was some environmental change that affected the results. Note 2. If PH is not (obviously, visually working), it is rapidly aborted.
Exec Summary : nav post 43eaf10 seems pretty random to me at the moment. Other experiences solicited.