mavlink / MAVSDK-Python

MAVSDK client for Python.
https://mavsdk.mavlink.io
BSD 3-Clause "New" or "Revised" License
324 stars 220 forks source link

Telemetry getting stuck when there is many same subscriptions #380

Closed atruebin closed 3 years ago

atruebin commented 3 years ago

I need to get telemetry from multiple drones. So I start mavsdk_server for each, connect to them and trying to get telemetry. For some minor number of drones it works, but for more (approximately 10) it gets stuck (none of generators yields). So as for many same tasks for one drone.

The simpliest way to reproduce is to edit telemetry.py example like this:

for i in range(n):
  asyncio.ensure_future(print_battery(drone))

It makes no sense to make multiple identical tasks for one drone of cource, but it behaviours the same as one for each drone.

If n=8 it works, if n=12 gets stuck. But if you run one script with n=8, and one more with n=8 (for other mavsdk_server of cource), they both work. And if you use each subscription from Telemetry plugin once (there more than 12) it works as well.

So I can guess the problem is related to grpc, but don't know how to solve it. Any ideas?

JonasVautherin commented 3 years ago

Just one thing: in MAVSDK, you can subscribe to a callback only once in C++ (that's something that should be improved, but that's how it is now). So if you subscribe twice to the position against the same mavsdk_server, the first one will be unsubscribed when the first one is registered.

This is not a limitation in MAVSDK-Java and MAVSDK-Swift, because on the client side we handle it. But for Python we don't. I started looking into it a while ago, you can find my work here: https://github.com/mavlink/MAVSDK-Python/issues/60.

I feel like this is not your root issue, but this is the reason why your reproducible example fails. Is that correct?

atruebin commented 3 years ago

Well, maybe this is the reason why experiment with one mavsdk_server fails.

But yes, my goal is multiple drones, so there should be many mavsdk_servers and each one has only one subscription of a type. In this case unsubscribing not happening at all, right?

When you start many mavsdk_servers and create System instance for each, the only thing that intersects is some global variables and global limitations, as far as I understand. This is why I think of gRPC, cause there is one server for the whole script, yes?

JonasVautherin commented 3 years ago

Well, maybe this is the reason why experiment with one mavsdk_server fails.

Yep, probably

so there should be many mavsdk_servers

Yes, but you are starting them manually, each listening on a different gRPC port, right?

and each one has only one subscription of a type. In this case unsubscribing not happening at all, right?

Correct :+1:

the only thing that intersects is some global variables and global limitations, as far as I understand.

They should not intersect at all. You have multiple System objects that are each connected to a different gRPC server. Just like there is nothing shared between your Firefox connected to DuckDuckGo and your Safari connected to ProtonMail: it's twice HTTPS, but those are two very separate connections, clients and servers.

How do you start the mavsdk_server instances?

atruebin commented 3 years ago

With subprocess.Popen([mavsdk_server_path, '-p', f'500{d_id}', f'udp://0.0.0.0:150{d_id}'], stdout=subprocess.PIPE).

JonasVautherin commented 3 years ago

Hmm, so they should not share anything with each other if {d_id} is indeed different everytime :thinking:. And then are you sure that each of your System() connects to the right instance? How do you do that part?

atruebin commented 3 years ago

Right after starting mavsdk_servers:

d = mavsdk.System(mavsdk_server_address='localhost', port=int(f'500{d_id}'))
await d.connect(system_address=f'udp://0.0.0.0:150{d_id}')
drones.append(d)
JonasVautherin commented 3 years ago

Can you try without setting the argument in connect()?

d = mavsdk.System(mavsdk_server_address='localhost', port=int(f'500{d_id}'))
await d.connect()
drones.append(d)

I don't think it makes a difference, but it's useless anyway if you set mavsdk_server_address above.

atruebin commented 3 years ago

Tried, the result is same.

JonasVautherin commented 3 years ago

I don't get how the MAVSDK instances would interfere with each other :thinking:. How do you start the simulated drones? All in the same gazebo instance? Could it be that MAVLink stops being sent from the other side, rather than it not being received on the MAVSDK side?

atruebin commented 3 years ago

This is actualy real drones connected to router. And I just tried turn on 10 ones, and it started working!

Experementaly it turned out that the problem occures when there are 10 or more inactive subscriptions (not connections!). When 10 drones turned on, script works with any number subscriptions for working drones and 9 subscriptions for not working, and with 10 subscriptions for not working - gets stuck.

This definetly not problem on the other side, cause starting 2 scripts simultaneously for more then 10 inactive subscriptions in total - works.

JonasVautherin commented 3 years ago

Sorry, what do you mean by "inactive subscriptions"?

atruebin commented 3 years ago

I mean subscription to stream of the mavsdk_server that connected to empty UDP port (there is no data).

Imagine I have working script, telemetry comes from all drones. I turn off one drone, subscriptions for that server become 'inactive' (so there become more than 10 'inactive' subscriptions in total) and all telemetry of all drones in script gets stuck. I turn it on again, it continue working.

JonasVautherin commented 3 years ago

What if you start mavsdk_server from a shell instead of using subprocess.Popen? Does subprocess.Popen somehow link them together? Like could it be that one being stuck would block all the others there? Not sure how subprocess.Popen works :thinking:

atruebin commented 3 years ago

Quickly tried one shell mavsdk_server and others from script. Shell one gets stuck as well.

atruebin commented 3 years ago

Well, I thought I solved this problem for my use case, but no.

I made checker for UDP ports, and if one have no data, than I killed mavsdk_server and stop telemetry tasks in Python for that drone, until he's back online. And I thought that worked, until I tried drones with different telemetry frequency. And there some things became clearer.

I tested 3 drones with 5 telemetry subscriptions in Python each. Two drones with 2Hz freq, one with 10Hz. So there are 10 "slow" subscriptions and 5 "fast". When 10Hz-guy received one measurement (in subscription), he do not receive next one until that two "slow" guys receive their, like there is some queue. And if I use only one 2Hz drone and one 10Hz, everything works (because number of subscriptions become 10).

May it be somehow ralated to number of threads processing gRPC data or something? https://stackoverflow.com/questions/55338451/grpc-python-max-workers-limiting-number-of-simultaneous-processes I understand that MAVSDK use async version of gRPC and creates insecure_channel instead of server, but I hardly understand how it works to continue investigation)

atruebin commented 3 years ago

Seems I finally found solution. Creating insecure_channel with argument standalone_pool_for_streaming = True breaks this limitations.

JonasVautherin commented 3 years ago

Nice! Is that something that we can change in MAVSDK? Could you make a PR? That would be really appreciated :blush: