mavlink / qgroundcontrol

Cross-platform ground control station for drones (Android, iOS, Mac OS, Linux, Windows)
http://qgroundcontrol.io
3.18k stars 3.53k forks source link

Joystick: MANUAL_CONTROL messages are not regular #9484

Closed Williangalvani closed 3 years ago

Williangalvani commented 3 years ago

Expected Behavior

MANUAL_CONTROL messages should be sent with a regular interval when there is a joystick connected (or virtual joystick is enabled.

Current Behavior

Every 30 seconds, something disrupts the messages, causing large latency spikes

Steps to Reproduce:

Please provide an unambiguous set of steps to reproduce the current behavior

  1. Open SITL (any?)
  2. Open QGC
  3. Close QGC
  4. Check latest telemetry log

System Information

Detailed Description

This causes ArduSub to complain about "Lost Manual Control". Which is correct. While this is usually pretty quick, I just looked a a users tlog where this freeze peaked at 25 seconds! Every time the blue line rises above the orange one, ArduSub emits a "Manual Control Lost" warning.

image User log: 2021-02-17 17-04-38.zip

Investigation

I did some digging already, and this seems to be caused by QGCMapEngine::testInternet() (the only relevant timer fired every 30s).

This is a regular log of SITL: image

This is SITL without checking for internet connection:

image

So it seems that on some setups, the internet check can take long enough to disrupt regular operation of QGC...

Current workaround:

Just uncheck "Check for internet connection" in settings

DonLakeFlyer commented 3 years ago

So it seems that on some setups, the internet check can take long enough to disrupt regular operation of QGC...

But the question is why? That should be happening on its own thread.

jafrado commented 3 years ago

Hmm. Interesting. My suggestion would be to disable roaming and scanning over WiFi when flying. The issue is that most modern IP stacks (Win/Linux/iOS) will temporarily go off-channel when scanning for new networks (if you understand how 802.11 scanning works you will get it). The net effect is that latency will continue to stack up and as traffic increases, delays may become unbounded and/or create loss. I had this problem while developing some commercial real-time wireless streaming products. One of the classical ones was "gaps in wireless streaming" - the fix - disable scanning while streaming. I think if you google "scanning" one would find some good reads on this topic today.

DonLakeFlyer commented 3 years ago

Wow, nasty.

My suggestion would be to disable roaming and scanning over WiFi when flying.

That should be doable since QGC knows if it has any vehicle connected and also has a flying state.

jafrado commented 3 years ago

yep, very nasty. We don't want to be off-channel while sending or receiving mavlink. Yes, lots of options to fix, here's a good read on the topic: https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/

patrickelectric commented 3 years ago

The main problem with the current implementation is that the QGCCacheWorker does not have an event loop (not calling exec). Since network function that are not async (such as waitConnection) needs an eventloop, the Fuctor will run in the main thread.

To fix this behaviour, the implementation was moved from to an async approach with signals. Check #9591