mavlink / rust-mavlink

MAVLink library for Rust.
https://mavlink.github.io/rust-mavlink/mavlink/
Apache License 2.0
153 stars 78 forks source link

Routing issue: somehow packets get dropped communicating with ground station #189

Open CodeTriangle opened 1 year ago

CodeTriangle commented 1 year ago

This is a bit of a weird issue that we've been having. I was hesitant to classify it as an issue fit to be published here, but I have been told that @podhrmic at least is willing to look it over.

I've been working with rust-mavlink over the past few days to create a minimal MAVLink router (ideally one that is capable of replacing mavlink-router but will run in embedded and lower-memory environments). The program is very simple. It spawns a thread per endpoint which waits for a message on that endpoint and sends it to all the other endpoints.

Now we get to the crux of the issue: we're having some transmission problems. We have QGroundControl hooked into one side of the router and a PX4 aircraft in the other side. After about 20 or so messages, the endpoint connected to QGroundControl becomes unresponsive because QGroundControl switches to MAVLink 1. That happens because it doesn't think it's received an AUTOPILOT_VERSION message (requested by QGroundControl on startup with MAV_CMD_REQUEST_MESSAGE). My code prints every message received on all endpoints. According to these logs, the router does receive the message so it should be sent on to QGroundControl, but QGC is clearly missing them when they do come in.

This error only happens under the following conditions:

Here's the repository if anyone is willing to try to reprodice this: https://github.com/AggieAir/rust-mavlink-router

podhrmic commented 1 year ago

@CodeTriangle I haven't used rust-mavlink in a while, but based on what you are describing this sounds more like a network problem:

CodeTriangle commented 1 year ago
  • is MAV_CMD_REQUEST_MESSAGE the only message that is being broadcasted? Or can you positively capture other broadcasted messages?

  • and if so, is MAV_CMD_REQUEST_MESSAGE the first message that is being broadcasted? Maybe your receiver drops the first broadcast packet?

Far from it. Heartbeats are verifiably sent every second, pings also come in occasionally.

  • you say the router receives the message, but do you know that the message is being sent (do you have a sent message log)?

I don't suppose I do, but it hasn't been needed yet because...

  • are you doing packet capture with wireshark on all endpoints to see that the packets are delivered before being passed to the applications?

Yes, I have done this. The packets exist on both sides.

podhrmic commented 1 year ago

Based on what you said, it sounds like QGroundControl doesn't accept the AUTOPILOT_VERSION packet (assuming you have a proof that packet indeed reaches the ground station) - would you agree? I don't know what is your endpoint config - is rust-mavlink somehow involved beyond being a part of the router?

CodeTriangle commented 1 year ago

Based on what you said, it sounds like QGroundControl doesn't accept the AUTOPILOT_VERSION packet (assuming you have a proof that packet indeed reaches the ground station) - would you agree?

QGC is certainly not accepting the package for some reason or another. It's not the packet's fault, though. The same AUTOPILOT_VERSION packet is accepted under different stacks than the one detailed here.

I don't know what is your endpoint config - is rust-mavlink somehow involved beyond being a part of the router?

It is involved in that when rust-mavlink copies the messages to other ports things start falling apart whereas things work as expected when they're just sent directly via serial link or through mavlink-router.

podhrmic commented 1 year ago

It is involved in that when rust-mavlink copies the messages to other ports things start falling apart whereas things work as expected when they're just sent directly via serial link or through mavlink-router.

@CodeTriangle do you have a minimal working example that uses rust-mavlink? E.g. something that doesn't involve your router? It would be helpful if you knew that rust-mavlink works in non-routed situation (GCS <-> UAV) and provides the correct packets.

If you suspect that one of the packets is not (de)serialized correctly (I suspect you mean AUTOPILOT_VERSION) then the best would be if you capture a working AUTOPILOT_VERSION packet, and try to (de)serialize it with rust-mavlink and see if you get the same data at the end.

I admit I didn't study much what is expected from the mavlink-router - but you seem to modify the packets during processing, see this line. The mavlink router also seems to have some requirements about UDP broadcast vs unicast so you might have to implement that behavior as well.

CodeTriangle commented 1 year ago

It is involved in that when rust-mavlink copies the messages to other ports things start falling apart whereas things work as expected when they're just sent directly via serial link or through mavlink-router.

@CodeTriangle do you have a minimal working example that uses rust-mavlink? E.g. something that doesn't involve your router? It would be helpful if you knew that rust-mavlink works in non-routed situation (GCS <-> UAV) and provides the correct packets.

I can try to strip the program down even more, but honestly I don't see what I can take out. What you see is the bare minimum. It spawns up some number of threads, blocks on each until they receive a message, then immediately acts on the message, sending it to the other endpoints.

If you suspect that one of the packets is not (de)serialized correctly (I suspect you mean AUTOPILOT_VERSION) then the best would be if you capture a working AUTOPILOT_VERSION packet, and try to (de)serialize it with rust-mavlink and see if you get the same data at the end.

Sure, I can try this.

I admit I didn't study much what is expected from the mavlink-router - but you seem to modify the packets during processing, see this line.

No, this line of code runs after the message is resent. Modifying the message would be impossible since it's marked as immutable. What is modified is my_map, which maps sysid/compid pairs to the last seen sequence id. This is useful for accounting for packet loss.

The mavlink router also seems to have some requirements about UDP broadcast vs unicast so you might have to implement that behavior as well.

Avoiding mavlink-router's restrictions on UDP broadcast is one of the motivations of this project. We want to be able to broadcast to any number of ground station instances on the same port.

podhrmic commented 1 year ago

Avoiding mavlink-router's restrictions on UDP broadcast is one of the motivations of this project. We want to be able to broadcast to any number of ground station instances on the same port.

I suspect that QGroundControl might have some restrictions on the shape and form of the communication, presumably the mavlink-router's restrictions are there for a reason. My immediate concern would be how to ensure that only one GCS can command the UAV etc. Unless the GCS has some listening only mode, multiple GCS might not even be possible.

If you agree that is indeed the problem, perhaps best to close this issue and post your question at https://github.com/mavlink/qgroundcontrol/issues ?

patrickelectric commented 11 months ago

@CodeTriangle we merged a series of fixed on master right now related to serialization / deserialization. Can you check now if it works ? I'll release a 0.12.2 soon.

patrickelectric commented 10 months ago

Possible issue: https://discord.com/channels/1022170275984457759/1169141076121112646/1181939053776609361