mavlink / MAVSDK-Swift

MAVSDK client for Swift.
https://mavsdk.mavlink.io
BSD 3-Clause "New" or "Revised" License
27 stars 25 forks source link

GRPC_CALL_OK crash #133

Closed douglaswsilva closed 5 years ago

douglaswsilva commented 5 years ago

Tested with:

Reproduce:

Crash message:

E0624 15:20:24.714529000 6164901888 call_op_set.h:943] assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)

I noticed that the crash will happen after subscribing to some telemetry observables. I'd receive some results, for instance position would return location for a few seconds and then crash after a while.

I saw crashes in observePosition(), observeGroundSpeed(), observeGPS(), but suspect it can happen in any place we have some observable callback, specially in telemetry.

Crash Stack:

#3        0x00000001044129ac in grpc::CoreCodegen::assert_fail(char const*, char const*, int) ()
#4        0x0000000104423dbc in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallNoOp<3>, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6> >::ContinueFillOpsAfterInterception() ()
#5        0x00000001042df4ec in grpc::ServerWriter<dronecode_sdk::rpc::telemetry::PositionResponse>::Write(dronecode_sdk::rpc::telemetry::PositionResponse const&, grpc::WriteOptions) ()
#6        0x0000000104155440 in std::__1::__function::__func<dronecode_sdk::backend::TelemetryServiceImpl<dronecode_sdk::Telemetry>::SubscribePosition(grpc::ServerContext*, dronecode_sdk::rpc::telemetry::SubscribePositionRequest const*, grpc::ServerWriter<dronecode_sdk::rpc::telemetry::PositionResponse>*)::'lambda'(dronecode_sdk::Telemetry::Position), std::__1::allocator<dronecode_sdk::backend::TelemetryServiceImpl<dronecode_sdk::Telemetry>::SubscribePosition(grpc::ServerContext*, dronecode_sdk::rpc::telemetry::SubscribePositionRequest const*, grpc::ServerWriter<dronecode_sdk::rpc::telemetry::PositionResponse>*)::'lambda'(dronecode_sdk::Telemetry::Position)>, void (dronecode_sdk::Telemetry::Position)>::operator()(dronecode_sdk::Telemetry::Position&&) ()
#7        0x0000000104409158 in std::__1::__function::__func<dronecode_sdk::TelemetryImpl::process_global_position_int(__mavlink_message const&)::$_1, std::__1::allocator<dronecode_sdk::TelemetryImpl::process_global_position_int(__mavlink_message const&)::$_1>, void ()>::operator()() ()
#8        0x00000001043ccbc4 in dronecode_sdk::ThreadPool::worker() ()
#9        0x00000001043cd1c8 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (dronecode_sdk::ThreadPool::*)(), dronecode_sdk::ThreadPool*> >(void*) ()
douglaswsilva commented 5 years ago

Note: I tested with Gazebo and didn't get the crash. I tried starting SITL before and after starting the app and it worked in both cases. We just have to fix this for the actual drone.

unipheas commented 5 years ago

@douglaswsilva Just wanted to confirm, but when you pulled the latest version of the example app you are manually updating it to 0.5.0 from 0.4.1, right? I'm changing the Cartfile to github "Dronecode/DronecodeSDK-Swift" == 0.5.0 now and will test everything.

That said, I tried the current version of the example app (with 0.4.1) and it wasn't establishing a connection with my ESP8266 WiFi Module, but QGC was able to.

update: Not able to get it to connect with 0.5.0 either.

JonasVautherin commented 5 years ago

@unipheas: Are you getting a crash, or is it just not connecting? Does the backend output anything at all (at least the version number and a few log lines)?

JonasVautherin commented 5 years ago

@douglaswsilva: I'm trying to reproduce, and also testing with latest mavsdk_server 0.18.0.

unipheas commented 5 years ago

@unipheas: Are you getting a crash, or is it just not connecting? Does the backend output anything at all (at least the version number and a few log lines)?

It's not crashing, just not connecting. Here is the log

2019-06-26 19:44:58.875767+0800 DronecodeSDK_Swift_Example[65517:4136020] [DYMTLInitPlatform] platform initialization successful
2019-06-26 19:44:59.109322+0800 DronecodeSDK_Swift_Example[65517:4135848] creating player instance using shared library
Running backend in background (MAVLink port: 14540
[07:44:59|Info ] DronecodeSDK version: 0.15.0 (dronecode_sdk_impl.cpp:25)
[07:44:59|Debug] New: System ID: 0 Comp ID: 0 (dronecode_sdk_impl.cpp:285)
[07:44:59|Info ] Server set to listen on 0.0.0.0:50051 (grpc_server.cpp:43)
[07:44:59|Info ] Server started (grpc_server.cpp:27)
[07:44:59|Info ] Waiting to discover system... (connection_initiator.h:57)
JonasVautherin commented 5 years ago

@unipheas: I believe this is a different issue. Let's continue it on #134.

JonasVautherin commented 5 years ago

@douglaswsilva Just wanted to confirm, but when you pulled the latest version of the example app you are manually updating it to 0.5.0 from 0.4.1, right?

@unipheas: I believe they don't, they are still on Swift 4. But I'm now wondering if that could be related somehow (e.g. grpc-swift 0.8.1 has issues with the newer version of grpc we use on the server side. Remember we had issues with grpc-swift 0.8.2 already?).

unipheas commented 5 years ago

@JonasVautherin I do remember.

unipheas commented 5 years ago

@douglaswsilva can you run the example app through Xcode while your phone is plugged in and tell us what the response is in the console? If it's crashing it should be producing some sort of error.

douglaswsilva commented 5 years ago

@unipheas Yeah, here it is:

2019-07-03 09:14:56.750499-0700 DronecodeSDK_Swift_Example[9440:2545104] [DYMTLInitPlatform] platform initialization successful
2019-07-03 09:14:56.900792-0700 DronecodeSDK_Swift_Example[9440:2544927] creating player instance using shared library
Running backend in background (MAVLink port: udp://:14540
[09:14:56|Info ] MAVSDK version: 0.18.1 (mavsdk_impl.cpp:25)
[09:14:56|Debug] New: System ID: 0 Comp ID: 0 (mavsdk_impl.cpp:333)
[09:14:56|Info ] Server set to listen on 0.0.0.0:50051 (grpc_server.cpp:44)
[09:14:56|Info ] Server started (grpc_server.cpp:28)
[09:14:56|Info ] Waiting to discover system... (connection_initiator.h:58)
[09:15:45|Info ] New system on: 192.168.42.1:38499 (udp_connection.cpp:228)
[09:15:45|Debug] Component Camera 1 (100) added. (system_impl.cpp:400)
[09:15:45|Debug] request camera info (camera_impl.cpp:141)
[09:15:45|Info ] Using cached file for Yuneec E90. (camera_impl.cpp:890)
[09:15:45|Debug] Ignoring string params. (camera_definition.cpp:121)
[09:15:45|Debug] Ignoring custom params. (camera_definition.cpp:126)
[09:15:45|Debug] Ignoring custom params. (camera_definition.cpp:126)
[09:15:45|Debug] Component Autopilot (1) added. (system_impl.cpp:400)
[09:15:45|Debug] Component Gimbal (154) added. (system_impl.cpp:400)
[09:15:46|Debug] Discovered 3 component(s) (UUID: 3762002188609060911) (system_impl.cpp:563)
[09:15:46|Info ] System discovered [UUID: 3762002188609060911] (connection_initiator.h:62)
Drone connected with UUID : 3762002188609060911
E0703 09:15:46.774386000 6100955136 call_op_set.h:943]                 assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)
douglaswsilva commented 5 years ago

Screen Shot 2019-07-03 at 9 17 41 AM

douglaswsilva commented 5 years ago

This is my current cartfile.resolved, since I'm testing with Swift 5 now:

binary "https://code.videolan.org/videolan/VLCKit/raw/master/Packaging/MobileVLCKit.json" "3.1.4"
binary "https://s3.eu-central-1.amazonaws.com/dronecode-sdk/backend.json" "0.18.1"
github "Dronecode/DronecodeSDK-Swift" "af1b3cf64f30c6407542391b72645e2f0a41df05"
github "ReactiveX/RxSwift" "5.0.1"
github "jonasvautherin/grpc-swift" "820730e24b9cf035b2889b30d2fe87411e041720"

Let me know if you need anything else. :)

JonasVautherin commented 5 years ago

Note: they are building from the move-to-mavsdk-server branch, which is my work in progress following the renaming to MAVSDK :innocent:.

JonasVautherin commented 5 years ago

I could reproduce on Pixhawk 4 and on an old H520, and not on SITL, as @douglaswsilva observed. I reproduced running the Swift example app, both on the iOS simulator and on an iPad.

I could not reproduce in a (somewhat simpler) example in Python (running a mission while printing all the position, ground_speed_ned and gps_info messages), which suggests it may not come from mavsdk_server but rather from the Swift side.

Because I don't remember changing anything in the Swift code, I would guess it may come from the new version of grpc-swift (from 0.9 on). Maybe something changed and we should handle the streams a bit differently :thinking:.

First thing that comes to my mind is this change that I needed when moving to Swift 5. I don't believe it is the cause because I think @douglaswsilva was experiencing this bug before I started working on that branch.

@douglaswsilva or @byuarus: could you comment on that change (here), just to make sure it is not obviously wrong?

In the meantime I'll try to downgrade grpc-swift and see where I go.

JonasVautherin commented 5 years ago

You know what? I just ran another test:

And I reproduced it there! It took a few missions, but it did crash:

[10:30:59|Debug] Send mission item 0 (mission_impl.cpp:1118)
[10:30:59|Debug] Send mission item 1 (mission_impl.cpp:1118)
[10:30:59|Debug] Send mission item 2 (mission_impl.cpp:1118)
[10:30:59|Debug] Send mission item 3 (mission_impl.cpp:1118)
[10:31:00|Warn ] Retrying send mission count... (mission_impl.cpp:1339)
[10:31:00|Debug] Send mission item 3 (mission_impl.cpp:1118)
E0704 22:31:00.110097137   26875 call_op_set.h:943]          assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)
zsh: abort (core dumped)  ./mavsdk/bin/mavsdk_server
JonasVautherin commented 5 years ago

Well, I'm actually not sure if it is the same issue. It sounds like this assertion shows some kind of race condition. It crashes super quickly when running mavsdk_server on iOS, but not on linux or macOS.

Not sure why, because you don't have this crash on the production version, do you @douglaswsilva?

JonasVautherin commented 5 years ago

Ok I think that's a race in mavsdk_server. What I can't understand is why it appears now, because it has been like this for months.

Working on a fix!