mavlink / MAVSDK-Swift

MAVSDK client for Swift.
https://mavsdk.mavlink.io
BSD 3-Clause "New" or "Revised" License
27 stars 25 forks source link

Telemetry Health Calibration returning wrong states #164

Open douglaswsilva opened 4 years ago

douglaswsilva commented 4 years ago

Hey guys,

As I spoke with Jonas, sometimes we've been getting "wrong" calibration to some of the values when subscribing to telemetry.health. I finally was able to reproduce it connected to the debugger and this is what I see int he logs coming from the SDK:

[01:04:11|Info ] Server set to listen on 0.0.0.0:50051 (grpc_server.cpp:48)
[01:04:11|Info ] Server started (grpc_server.cpp:32)
[01:04:11|Info ] Waiting to discover system... (connection_initiator.h:582019-12-17 13:04:11.543473-0800 SiteScan[4552:1715194] TIC TCP Conn Failed [6:0x280ca8480]: 1:50 Err(50)
1.567022-0800 SiteScan[4552:1715194] Task <6D86C013-F1A5-4AE0-863B-60407A548E8A>.<1> HTTP load failed (error code: -1009 [1:50])
2019-12-17 13:04:11.580058-0800 SiteScan[4552:1715235] Task <6D86C013-F1A5-4AE0-863B-60407A548E8A>.<1> finished with error - code: -1009
|Info ] New system on: 127.0.0.1:60985 (udp_connection.cpp:263)
[01:04:11|Debug] Component Unsupported component (240) added. (system_impl.cpp:401)
[01:04:11|Debug] Discovered 1 component(s) (UUID: 1) (system_impl.cpp:561)
[01:04:11|Info ] System discovered [UUID: 1] (connection_initiator.h:62)
2019-12-17 13:04:11.860006-0800 SiteScan[4552:1715074] connectionState started
2019-12-17 13:04:11.860102-0800 SiteScan[4552:1715074] Firmware version is (null)
[01:04:12|Info ] New system on: 192.168.42.1:54780 (udp_connection.cpp:263)
[01:04:12|Debug] Component Camera 1 (100) added. (system_impl.cpp:401)
[01:04:12|Warn ] sending again, retries to do: 3  (511). (mavlink_commands.cpp:222)
[01:04:12|Debug] Component Autopilot (1) added. (system_impl.cpp:401)
[01:04:12|Debug] Component Gimbal (154) added. (system_impl.cpp:401)
[01:04:12|Error] Error: UUID changed (system_impl.cpp:257)
[01:04:12|Warn ] Command ack 520 not matching our current command: 530 (mavlink_commands.cpp:143)
[01:04:12|Debug] request camera info (camera_impl.cpp:140)
[01:04:12|Info ] Using cached file for Yuneec E90. (camera_impl.cpp:907)
[01:04:12|Debug] step range missing for CAM_CUSTOMWB (camera_definition.cpp:371)
[01:04:12|Error] min range missing for CAM_SPOTAREA (camera_definition.cpp:345)
[01:04:12|Warn ] Not found: CAM_SPOTAREA (camera_definition.cpp:220)
[01:04:12|Error] min range missing for CAM_ASPECTRATIO (camera_definition.cpp:345)
[01:04:12|Warn ] Not found: CAM_ASPECTRATIO (camera_definition.cpp:220)
[01:04:12|Debug] Ignoring string params. (camera_definition.cpp:121)
[01:04:12|Warn ] Default missing for CAM_SYSTEMTIME (camera_definition.cpp:196)
[01:04:12|Debug] Ignoring custom params. (camera_definition.cpp:126)
[01:04:12|Debug] Ignoring custom params. (camera_definition.cpp:126)
[01:04:12|Warn ] Default missing for CAM_ZOOM (camera_definition.cpp:196)
[01:04:12|Error] Error: get param busy timeout: CAL_GYRO0_ID (mavlink_parameters.cpp:456)
[01:04:12|Error] Error: Param for gyro cal failed. (telemetry_impl.cpp:872)
[01:04:12|Debug] MAVLink: info: data link #0 lost (system_impl.cpp:308)
[01:04:12|Debug] MAVLink: info: data link #0 regained (system_impl.cpp:308)
[01:04:12|Warn ] Comparison type mismatch between uint8_t and uint16_t (mavlink_parameters.h:407)
[01:04:12|Error] Chosen value smaller than minimum (camera_definition.cpp:550)

As you can see, there was a timeout in

[01:04:12|Error] Error: get param busy timeout: CAL_GYRO0_ID (mavlink_parameters.cpp:456)

Which made it be false and the other calibration values to be true. To reproduce this I just ran our app on iPad and while connected to drone via wifi.

If we are subscribing to too many things at the same time, therefore causing the timeout, please let us know. And if you have any suggestion for the best way to subscribe to things, we'd appreciate.

Once again, thanks for looking into this!

JonasVautherin commented 4 years ago

Which made it be false and the other calibration values to be true.

Oh, so do you always get a timeout on the sensor when it fails? And for the different sensors?

@julianoes: do you have insights on what it means to have a timeout on one of those parameters? MAVSDK will try to get the parameter multiple times before it gives up, right? Could that mean that on the firmware side, something is wrong with this param?

julianoes commented 4 years ago

My thought is that this might be a link congestion issue on startup.

julianoes commented 4 years ago

We could try it with a longer timeout and more retransmissions.

douglaswsilva commented 4 years ago

@JonasVautherin The "wrong" calibration values don't happen every time. I was only able to reproduce this once when connected to the debugger. But I imagine that every time it happens, it's probably caused by this timeout in one of the calibration values.

@julianoes That'd be cool to try. Let me know how I can help.

JonasVautherin commented 4 years ago

@julianoes is there an easy and ugly way to add a super long timeout, just to confirm that this is the root cause? From discussions with @douglaswsilva, it seems like they can reproduce it reasonably often (like 1/5th of the time) in their setup.

douglaswsilva commented 4 years ago

@julianoes If there's a place I can increase the timeout in the mavsdk, let me know. I can do it locally, build, and test it to see if it fixes the issue.

julianoes commented 4 years ago

https://github.com/mavlink/MAVSDK/blob/d9ecfcb681fd2340768e263d7b265e6d3a607aea/src/core/mavlink_parameters.cpp#L197 https://github.com/mavlink/MAVSDK/blob/d9ecfcb681fd2340768e263d7b265e6d3a607aea/src/core/mavlink_parameters.cpp#L248

Oops, retry is not implemented yet: https://github.com/mavlink/MAVSDK/blob/d9ecfcb681fd2340768e263d7b265e6d3a607aea/src/core/mavlink_parameters.cpp#L461

douglaswsilva commented 4 years ago

@julianoes Thanks, I'll take a look into this.

douglaswsilva commented 4 years ago

@julianoes @JonasVautherin Our test engineering is currently testing the fix for this: https://github.com/mavlink/MAVSDK/pull/972/files. Thanks Julian!

On that note, I noticed another issue that I'm double testing and believe it shouldn't be hard to fix, but I'd need your guys help. I think if you subscribe to drone.telemetry.health before the drone is connected and then connect the drone, it will not update the values.

As I said, I'm going to double check this, but let me know if you spot anything weird.

julianoes commented 4 years ago

Right, we should probably reset the health data here: https://github.com/mavlink/MAVSDK/blob/c5404eb7ca88b6a404b82154d7ad49b3cc6b7d17/src/plugins/telemetry/telemetry_impl.cpp#L186

JonasVautherin commented 4 years ago

@julianoes I had a quick look at it, and I don't understand your comment. Shouldn't health be advertised on each heartbeat, as per process_heartbeat here?

julianoes commented 4 years ago

Some of the health information is based on parameters that are queried. That is only a "workaround for now" but that's why the data can be stale.

JonasVautherin commented 4 years ago

And disable() will be called when the drone is discovered? :thinking: