warmcat / libwebsockets

canonical libwebsockets.org networking library
https://libwebsockets.org
Other
4.77k stars 1.49k forks source link

Crash at OpenSSL_client_verify_callback #1134

Closed CreazyBear closed 6 years ago

CreazyBear commented 6 years ago

Hi, Happy New Year.

I am not sure there is a specified style for post a issue.

Last iteration I updated libwebsockets from 2.1.1 to 2.4.1. I downloaded the code from the tag 2.4.1 and build the .a lib for my iOS App. And I got the following crash, but in a very low percent. I have no idea where is going wrong. The following is the crash log from Fabric.

#26. Crashed: websockets.workQueue.22D9E6CD-D7D3-4C86-B3BE-CBB24E5B0AFD
0  HabitApp                       0x101740e8c OpenSSL_client_verify_callback
1  HabitApp                       0x101740e80 OpenSSL_client_verify_callback
2  HabitApp                       0x10159bb0c internal_verify
3  HabitApp                       0x10159b6e8 X509_verify_cert
4  HabitApp                       0x1016ce380 ssl_verify_cert_chain
5  HabitApp                       0x1016afcb4 ssl3_get_server_certificate
6  HabitApp                       0x1016aeeac ssl3_connect
7  HabitApp                       0x1016bc28c ssl23_connect
8  HabitApp                       0x101741240 lws_ssl_client_connect2
9  HabitApp                       0x10173e778 lws_client_socket_service
10 HabitApp                       0x10175dc3c lws_service_fd_tsi
11 HabitApp                       0x10174d4e4 _lws_plat_service_tsi
12 HabitApp                       0x10174d5c8 lws_plat_service
13 HabitApp                       0x10175de14 lws_service
lws-team commented 6 years ago

Can you get an improved backtrace that can identify the source line at each stack frame?

Because from this I also get no idea about the problem... it has two frames in the callback I dunno if that means something. The rest of it is just "getting it into the callback".

lws-team commented 6 years ago
cmake .. -DCMAKE_BUILD_TYPE=DEBUG

is usually a good place to start adding debug info to the build...

CreazyBear commented 6 years ago

The is a online crash, and up to now it can't recurrent by my colleagues or myself. I'm not sure the cmake .. -DCMAKE_BUILD_TYPE=DEBUG is a good idea for online app. And the end of this iteration is next weekend. I have got another online crash (also very low percent), it's have two frame at insert_wsi_socket_into_fds. I suppose something went wrong in my connect/reconnect machanism. I will check it out.

#21. Crashed: NSOperationQueue 0x170231560 :: NSOperation 0x171052210 (QOS: DEFAULT)
0  HabitApp                       0x100b726cc insert_wsi_socket_into_fds
1  HabitApp                       0x100b72530 insert_wsi_socket_into_fds
2  HabitApp                       0x100b5fce8 lws_client_connect_2
3  HabitApp                       0x100b60f4c lws_client_connect_via_info2
4  HabitApp                       0x100b74d20 lws_header_table_attach
5  HabitApp                       0x100b60c70 lws_client_connect_via_info
6  HabitApp                       0x1008029d8 -[WebSocketsService setupConnectInfo] 

An other question about the timeout_secs. In order to got the LWS_CALLBACK_CLIENT_CONNECTION_ERROR as soon as the network is disconnect, I have the following code in app.

struct lws_context_creation_info creationInfo;
....
creationInfo.timeout_secs = -1;

In the history version app, I set it to 20 (Like default). But, in different device, LWS_CALLBACK_CLIENT_CONNECTION_ERROR come with different timeout_sec. Even if I set the timeout to 1, it can't get LWS_CALLBACK_CLIENT_CONNECTION_ERROR immediately. But, -1 get my demand. But I am not sure it's ok to set it to -1.

Thank you for your help!

lws-team commented 6 years ago

I'm not sure the cmake .. -DCMAKE_BUILD_TYPE=DEBUG is a good idea for online app.

I can't see why it's bad. Certainly not having it and then not being able to understand or fix your problems is bad.

insert_wsi_socket_into_fds

only one thread can use lws apis at once, right?

it's have two frame at

I think this is something to do with the backtrace dump... perhaps the "first" one counting from the bottom is the entry point of the function and and "second" one (top of the list) is the place with the crash.

But I am not sure it's ok to set it to -1.

Generally setting timeout_secs to -1 isn't a good idea. If the timeout check (once per second) finds a timeout set to -1 it always closes the connection. So you make a race between the next step the timeout is protecting and the 1s checks.

If lws hears from the OS the connection is dead (a FIN, or a write failed fatally), it will react to close it immediately. If the connection is just silently dead, you need to use the ping / pong autoprobing to detect it.

CreazyBear commented 6 years ago

Ok, thank you for your advice.