mavlink / MAVSDK

API and library for MAVLink compatible systems written in C++17
https://mavsdk.mavlink.io
BSD 3-Clause "New" or "Revised" License
633 stars 510 forks source link

Running SITL for 4 days, heartbeat lost #857

Closed shrit closed 5 years ago

shrit commented 5 years ago

Hello,

I have noticed that after running simulation for 4 days at low quadcopters speed (1 m/s), and after updating to the last SITL commit in which iris SDF file has been update with new plugins for magnometers and gyroscope, that gazebo is slowing down, the real time factor is going down to 0.22 Also the communication between the SDK and the PX4 is slowing. I am even losing heart beats during take off and landing. The output is the following:

11:06:27|Debug] MAVLink: info: DISARMED by auto disarm on land (system_impl.cpp:300)
[11:06:28|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:28|Debug] Lost 5283920058631409232 (mavsdk_impl.cpp:388)
[11:06:29|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:29|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:29|Debug] Lost 5283920058631409233 (mavsdk_impl.cpp:388)
[11:06:29|Debug] Lost 5283920058631409234 (mavsdk_impl.cpp:388)
[11:06:29|Debug] Discovered 1 component(s) (UUID: 5283920058631409232) (system_impl.cpp:557)
[11:06:29|Info ] Discovered system with UUID: 5283920058631409232
[11:06:31|Debug] Discovered 1 component(s) (UUID: 5283920058631409233) (system_impl.cpp:557)
[11:06:31|Info ] Discovered system with UUID: 5283920058631409233
[11:06:31|Debug] Discovered 1 component(s) (UUID: 5283920058631409234) (system_impl.cpp:557)
[11:06:31|Info ] Discovered system with UUID: 5283920058631409234
[11:06:32|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:32|Debug] Lost 5283920058631409232 (mavsdk_impl.cpp:388)
[11:06:34|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:34|Debug] Lost 5283920058631409233 (mavsdk_impl.cpp:388)
[11:06:34|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:34|Debug] Lost 5283920058631409234 (mavsdk_impl.cpp:388)
[11:06:34|Debug] Discovered 1 component(s) (UUID: 5283920058631409232) (system_impl.cpp:557)
[11:06:34|Info ] Discovered system with UUID: 5283920058631409232
[11:06:35|Debug] Discovered 1 component(s) (UUID: 5283920058631409233) (system_impl.cpp:557)
[11:06:35|Info ] Discovered system with UUID: 5283920058631409233
[11:06:35|Debug] Discovered 1 component(s) (UUID: 5283920058631409234) (system_impl.cpp:557)
[11:06:35|Info ] Discovered system with UUID: 5283920058631409234

Do you have any idea about the issue and why this is happening??

Best regards,

JonasVautherin commented 5 years ago

Are you running 3 SITL instances at the same time? Is that on purpose?

shrit commented 5 years ago

Yes, exactly, I am running 3 quadcopters together.

shrit commented 5 years ago

I have been doing this from around a year now. But these problem are very recent. @JonasVautherin Do you have any idea about this bug?

JonasVautherin commented 5 years ago

Not really :confused:. Did you try to go to an older version of SITL to see if it is the reason? If that's related to an update made for SITL, then probably that would be an issue for the PX4 repo.

julianoes commented 5 years ago

@shrit interesting! If the realtime factor in Gazebo goes down I would argue that it is a PX4 issue and not a MAVSDK issue. MAVSDK will show timeouts in heartbeats if PX4 doesn't send them regularly enough anymore (more than 3 seconds for a heartbeat every second).

I would suggest that you try to check what is going on in PX4 when this happens. For instance check if the memory or CPU usage goes up and up (e.g. using htop) and you can also check that for Gazebo (gzserver).

shrit commented 5 years ago

@julianoes I am pretty sure it is a PX4 issue. I have a pretty good machine with 2 Intel(R) Xeon(R) CPU E5-2609 v4 @ 1.70GHz with 16 cores and 32 GB of RAM. I have been also monitoring using htop. I have noticed that gzserver is not using one core (about 70 % of it) and all PX4 have the same even the usage is a little bit reduced after 2 or 3 days.

@JonasVautherin The only major differences is the modification of theiris.sdf file and the problem is that this file is not added to git (no version tracking for this sdf file) which is a little bit problematic since this will require a manual modification on the file

shrit commented 5 years ago

The problem existed from long time ago, but now the real time factor is reduced more rapidly that before, after the new plugins in iris.sdf

JonasVautherin commented 5 years ago

So I think you should re-open that issue in the PX4 repo, as it belongs there :smile:.

JonasVautherin commented 5 years ago

@shrit: did you report that on PX4? Somebody on #mavsdk (Slack) may have a similar issue, but I can't seem to find your issue there :confused:

shrit commented 5 years ago

@JonasVautherin Yes of course, But no one is answering, even if this is a little bit urgent Have a look here please: https://github.com/PX4/Firmware/issues/12975

julianoes commented 5 years ago

@shrit this is an open source community and people will fix whatever problems are most urgent to them. You can't expect anyone to solve anything for you because it's urgent to you (unless you pay someone to do it).