mavlink / MAVSDK

API and library for MAVLink compatible systems written in C++17
https://mavsdk.mavlink.io
BSD 3-Clause "New" or "Revised" License
622 stars 507 forks source link

ArduPilot - Problem identified that may be causing the mission module to fail (fly_mission example) #2110

Open trolledbypro opened 1 year ago

trolledbypro commented 1 year ago

Hello, I have found a possible cause of this issue of the mission module failing in ArduPilot. Using Wireshark, I analyzed the network traffic between my ArduPilot SITL (running inside a WSL2 Docker container that runs the latest ArduPilot Docker image) and the MAVSDK example (running inside WSL2 as I was having issues running it in Windows 10). My MAVSDK library and my fly_mission binary were built in debug mode using CMake through CMake Tools in VSCode (using VSCode WSL Remote session).

Here is my terminal output:
image

There is a debug message saying the gimbal version is v1, and that the mission is unsupported. Examining Wireshark, the last message sent before the MISSION_ACK message containing the MAV_MISSION_UNSUPPORTED payload was MAV_CMD_DO_MOUNT_CONFIGURE (204).

When gimbal angles are defined in a mission_item, the gimbal version is verified: https://github.com/mavlink/MAVSDK/blob/fadfe22e7c053945107201e2638be9a7f8b8ee5c/src/mavsdk/plugins/mission/mission_impl.cpp#L460 We then use the version above and send a version specific message containing the gimbal configuration. Below, we see that the unsupported message is sent if the gimbal version is assumed to be v1, as is the case in my console output: https://github.com/mavlink/MAVSDK/blob/fadfe22e7c053945107201e2638be9a7f8b8ee5c/src/mavsdk/plugins/mission/mission_impl.cpp#L1188-L1221

It appears that MAVSDK is not finding out the supported gimbal version. Gimbal v1 is deprecated in ArduPilot as of version 4.3. Wireshark shows that no GIMBAL_MANAGER_INFORMATION (280) messages were sent from the AP and no MAV_CMD_REQUEST_MESSAGE (512) message from MAVSDK requesting the gimbal version was sent. The GIMBAL_MANAGER_INFORMATION message is what the gimbal module uses to set the gimbal version.

I propose the following two solutions. I think the latter is more inclusive to more ArduPilot versions but the former could be implemented quicker:

What do you all think?

JonasVautherin commented 1 year ago

Thanks @trolledbypro for looking into this! :rocket:

I think I would be in favour of a third version: drop gimbal v1 entirely and always assume v2. There are many problems with the detection anyway.

One that I learn here is that the mission module needs to know about the gimbal protocol, so either we need to duplicate the detection that we already have in the gimbal module, or we need the mission module to depend on the gimbal one somehow (which we have never done with any plugin before).

But on top of that, the detection we have in the gimbal module is fundamentally flawed (and the root cause is in the MAVLink design): MAVLink does not offer a mechanism to differentiate between a timeout and an unsupported message (many modules just ignore commands they don't support, and the MAVLink specs allow a broadcast command which require this behavior). So we can be sure that a drone supports Gimbal v2 when everything works ("yay I received a v2 message, I know it supports v2"), but we can never be sure that it does not support v2 ("I did not receive a v2 message, is it because the drone does not support v2, or is it because the messages were lost?"). We could probably do more convoluted heuristics (like "I got 3 timeouts for the gimbal v2 requests while I received X messages from the drone, so I will say that I am reasonably confident that it was not a timeout"), but... well is it worth it?

I don't know of a single drone that uses Gimbal v1, and v2 has been out there for years. Maybe it is time to drop v1 entirely.

@julianoes what do you think?

trolledbypro commented 1 year ago

Hey @JonasVautherin,

By looking into the Gimbal module code and at my debugging, it seems that the Gimbal module init and enable methods are not being called. I believe this is the case because Wireshark does not detect a MAV_CMD_REQUEST_MESSAGE being sent from to the AP. Even if the message is unsupported, it would still be sent.

I am not sure which thread calls these methods, I don't think it's the main thread because I could not step into these methods: https://github.com/mavlink/MAVSDK/blob/fadfe22e7c053945107201e2638be9a7f8b8ee5c/src/mavsdk/plugins/gimbal/gimbal_impl.cpp#L29-L49

These methods establish the gimbal version. The absence of these methods being called is causing a timeout which creates an assumption that gimbal v1 is present: Here is the code that makes the main thread wait for the version. These methods are being called by the mission module. The timeout is created by the MissionImp module at creation (?) https://github.com/mavlink/MAVSDK/blob/fadfe22e7c053945107201e2638be9a7f8b8ee5c/src/mavsdk/plugins/gimbal/gimbal_impl.cpp#L197-L208 Here is the timeout code that sets it to v1. https://github.com/mavlink/MAVSDK/blob/fadfe22e7c053945107201e2638be9a7f8b8ee5c/src/mavsdk/plugins/mission/mission_impl.cpp#L126-L131

JonasVautherin commented 1 year ago

Hmm even if you instantiate a Gimbal object in your code? The plugins are initialized lazily, maybe you need to call one of the gimbal functions for it to start :thinking:

trolledbypro commented 1 year ago

I edited set_mission example to instantiate a Gimbal object and run set_mode and it still failed. Both the gimbal and mission modules experienced a gimbal mode timeout and defaulted to gimbal v1: image

JonasVautherin commented 1 year ago

Both the gimbal and mission modules experienced a gimbal mode timeout and defaulted to gimbal v1

Hmm, so the way we currently detect gimbal v2 is maybe not working with Ardupilot?

But that brings be back to my question above: no need to fix the detection for Ardupilot if we just default to v2 and drop v1 entirely. @julianoes what do you say?

JonasVautherin commented 1 year ago

I talked with Julian, and he agrees that we should drop gimbal v1. I won't be able to do that before 2 weeks because I am going on holidays, so... @trolledbypro if you want to do it before then, feel free! Just remove the detection and default to v2, and you can also remove the v1 implementation files. Really it shouldn't be very hard.

Otherwise I'll give it a shot when I'm back. Does that work for you @trolledbypro?

trolledbypro commented 1 year ago

@JonasVautherin I'll make a branch in my fork and get to work, should be doable for me (especially within 2 weeks)

JonasVautherin commented 1 year ago

Amazing, keep me posted!

trolledbypro commented 1 year ago

I did more debugging, and setting up a gimbal on SITL allowed gimbal version 2 to be successfully detected (I am dumb and forgot the -M flag :( ). However, I still am getting a mission unsupported message even with gimbal version 2. image It appears that gimbal being v1 was not the reason that the mission is failing. Gimbal v2 sends MAV_CMD_DO_GIMBAL_MANAGER_CONFIGURE (1001) as a mission item. DO_GIMBAL_MANAGER_CONFIGURE has only recently been implemented in ArduPilot https://github.com/ArduPilot/ardupilot/pull/23737. This command is used here https://github.com/mavlink/MAVSDK/blob/e7657c2d87917df739186981f34f98a14c898893/src/mavsdk/plugins/mission/mission_impl.cpp#L1288-L1312 and is called here https://github.com/mavlink/MAVSDK/blob/e7657c2d87917df739186981f34f98a14c898893/src/mavsdk/plugins/mission/mission_impl.cpp#L459-L480 It looks like it has not yet been implemented to work as a mission item. This is something I maybe able to fix by adding the support into ArduPilot. I will discuss with the devs there.

trolledbypro commented 1 year ago

These are two issues I made in ArduPilot that should make gimbal control work with MAVSDK (until I find new bugs): https://github.com/ArduPilot/ardupilot/issues/24691 https://github.com/ArduPilot/ardupilot/issues/24691 ArduPilot has yet to implement acquiring and removing gimbal control as defined in gimbal v2 protocol, so my solution is to accept MAV_CMD_DO_GIMBAL_MANAGER_CONFIGURE as a mission_item and have it do nothing until gimbal control is implemented or I get the time to have it modify the gimbal control struct added by the ArduPilot PR mentioned in my previous comment (that is presently being unused). The MAV_CMD_IMAGE_STOP_CAPTURE is also not implemented, it looks like image handling with a photo interval is done differently in ArduPilot. I have also added that command as a do nothing command for now.