novastone-media / MQTT-Client-Framework

iOS, macOS, tvOS native ObjectiveC MQTT Client Framework
Other
1.84k stars 465 forks source link

-[MQTTDecoder stream:handleEvent:] crashing with EXC_BAD_ACCESS KERN_INVALID_ADDRESS #325

Open repomies opened 7 years ago

repomies commented 7 years ago

Over the past days I've seen a few crashes like this for my app in Crashlytics. Any ideas what might be going wrong? Using MQTTClient 0.9.2.

Crashed: com.apple.main-thread
0  libobjc.A.dylib                0x18b3fc150 objc_msgSend + 16
1  MQTTClient                     0x1009f62b8 -[MQTTDecoder stream:handleEvent:] (MQTTDecoder.m:132)
2  CoreFoundation                 0x18c8d201c _signalEventSync + 212
3  CoreFoundation                 0x18c8d1f28 _cfstream_shared_signalEventSync + 440
4  CoreFoundation                 0x18c93542c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
5  CoreFoundation                 0x18c934d9c __CFRunLoopDoSources0 + 540
6  CoreFoundation                 0x18c9329a8 __CFRunLoopRun + 744
7  CoreFoundation                 0x18c862da4 CFRunLoopRunSpecific + 424
8  GraphicsServices               0x18e2cc074 GSEventRunModal + 100
9  UIKit                          0x192b1d058 UIApplicationMain + 208
10 MyApp                          0x1001970e8 main (AppDelegate.swift:20)
11 libdyld.dylib                  0x18b87159c start + 4
repomies commented 7 years ago

The one time so far I've managed to reproduce this on an Xcode debug, I got the following as the tail of the console log. I added one more debug print on line 130 of MQTTDecoder.m to print the dataBuffer as a UTF-8 string (prefixed with [MQTTDecoder] data: in the log), and looks like the problematic data was cting.. My guess is the backend is doing something seriously wrong here, but it'd be nice if the iOS client didn't crash on it.

2017-07-02 13:53:16.261999+0300 MyApp[61535:16904187] [MQTTDecoder] received (54)=<31343938 39393237 38383a20 536f636b 65742065 72726f72 206f6e20 636c6965 6e74203c 756e6b6e 6f776e3e 2c206469 73636f6e 6e65>...
2017-07-02 13:53:16.262015+0300 MyApp[61535:16904187] [MQTTDecoder] data: 1498992788: Socket error on client <unknown>, disconne
2017-07-02 13:53:16.262034+0300 MyApp[61535:16904187] [MQTTSession] other state
2017-07-02 13:53:16.262048+0300 MyApp[61535:16904187] [MQTTDecoder] NSStreamEventHasBytesAvailable
2017-07-02 13:53:16.262257+0300 MyApp[61535:16904187] [MQTTDecoder] fixedHeader=0x63
2017-07-02 13:53:16.262724+0300 MyApp[61535:16904187] [MQTTDecoder] digit=0x74 0x74 0 1
2017-07-02 13:53:16.262829+0300 MyApp[61535:16904187] [MQTTDecoder] remainingLength=116
2017-07-02 13:53:16.262862+0300 MyApp[61535:16904187] [MQTTDecoder] read 116 116
2017-07-02 13:53:16.262975+0300 MyApp[61535:16904187] [MQTTDecoder] received (118)=<6374696e 672e0a31 34393839 39323738 383a204e 65772063 6f6e6e65 6374696f 6e206672 6f6d2031 37382e35 352e312e 3138206f 6e20706f 72742031 3838332e 0a313439 38393932 3738383a 204e6577 20636c69 656e7420 636f6e6e 65637465 64206672 6f6d2031 37382e35 352e312e 31382061 7320>...
2017-07-02 13:53:16.263031+0300 MyApp[61535:16904187] [MQTTDecoder] data: cting.
1498992788: New connection from 178.55.1.18 on port 1883.
1498992788: New client connected from 178.55.1.18 as
2017-07-02 13:53:16.263055+0300 MyApp[61535:16904187] [MQTTMessage] unexpected payload after packet identifier
2017-07-02 13:53:16.263221+0300 MyApp[61535:16904187] [MQTTSession] MQTT illegal message received
2017-07-02 13:53:16.263238+0300 MyApp[61535:16904187] [MQTTSession] closeInternal
2017-07-02 13:53:16.263308+0300 MyApp[61535:16904187] [MQTTCFSocketTransport] close
ithaqua commented 7 years ago

I have met the same issue, any update?

pablogeek commented 7 years ago

Same issue, how is this going?

ckrey commented 7 years ago

the cting. is just the continuation of the data sent before ...disconne

What broker are you using? It seems the broker is sending mosquitto debug messages to the mqtt connection. This is not MQTT protocol.

The protocol violation is detected. The crash is probably caused by data already received which shouldn't be processed anymore after detecting the protocol violation

repomies commented 7 years ago

Hi @ckrey and sorry for the late reply — the backend is using Mosquitto (https://github.com/toke/docker-mosquitto) as far as I know. I will try to find out more once the backend developers are back from vacations :)

Strangely this bug peaked in the analytics at the latter half of June, then was mostly gone for most of July, until another peak over the very last few days. App usage has been quite constant over this period so it's a bit puzzling what exactly is causing it.

rockyftn commented 7 years ago

I have same issue on my side. Also i use Mosquito broker on other side. Do you have update or advice what to do?

repomies commented 7 years ago

I might have figured out why this happened. I called close() on my MQTTSession right before it was deallocated, which somehow led to MQTTDecoder getting deallocated right after it called decoder:didReceiveMessage: and crashing on next line's self.dataBuffer = nil;. Waiting for closeWithDisconnectHandler to return the callback before deiniting the session seems to have fixed this as far as I can tell.

repomies commented 7 years ago

Still saw the same crash at least once (though only on a non-debug build so far) despite the fix mentioned above. But it seems to happen much more rarely now, if anything.

repomies commented 7 years ago

Update: The "fix" I mentioned above didn't end up solving the problem, and if anything we actually saw the crashes peak over the past week or two. The root cause of the problem definitely seems to be some garbage (Mosquitto debug logs) being pushed into the pipeline by the backend (unfortunately I can't do anything about that). I tried a couple of other iOS MQTT client libraries and they actually dealt way worse with this problem, crashing right away when connecting, or never managing to parse any data.

However, it seems – and this still needs more data points to make sure but at 40% adoption and zero crashes so far, it sure looks promising – that my latest update got rid of the crashes simply by holding onto each MQTTSession instance for 30 more seconds after it disconnects before allowing it to be deallocated. The garbage in the data pipeline remains, but this way it doesn't seem to lead to the memory problems. (I'll post another update after I've seen whether the crashes truly are gone now.)

repomies commented 7 years ago

Yes, this time the crashes definitely seem to be gone for good. Still zero crashes reported for the newest app version after five days (we used to have at least a dozen every day).

pablogeek commented 7 years ago

I have the same crash in Fabric. Any updates on this? @repomies

repomies commented 6 years ago

@pablogeek yes, looks like in my case it completely disappeared after I deployed the fix described above – just holding onto the MQTTSession instance for extra 30 seconds after disconnecting it (and probably that's much longer than needed) so that the MQTTDecoder instance doesn't end up deallocated when it's still trying to process an invalid message.

sharonSilicus commented 6 years ago

@repomies I am also facing the same issue. How do you hold the session after disconnection? A sample code would help. Thanks.

ckrey commented 6 years ago

Can you describe what you are doing with MQTT? Are you subscribing to a topic with a lot of large retained messages and QoS0, which causes the broker to flood the client with data? Or do you use QoS1/2 but do not have specified a reasonable max_inflight_messages.

I do not want to say the library is error-free, but under "normal" circumstances it does behave quite stable

sharonSilicus commented 6 years ago

@ckrey While the application is moving to background, I close the connection to the MQTT server its an Azure IoT Hub. While coming back to foreground, I try to connect to MQTT server again. Once in a couple of days the crash appears.

- (void)applicationDidEnterBackground:(UIApplication *)application {
    // Use this method to release shared resources, save user data, invalidate timers, and store enough application state information to restore your application to its current state in case it is terminated later.
//     If your application supports background execution, this method is called instead of applicationWillTerminate: when the user quits.
    MQTTSession *session = [[MQTTSessionManager sharedManager] session];
    [session closeAndWait:1];
}
- (void)applicationDidBecomeActive:(UIApplication *)application {
    // Restart any tasks that were paused (or not yet started) while the application was inactive. If the application was previously in the background, optionally refresh the user interface.

    MQTTSession *session = [[MQTTSessionManager sharedManager] session];

    if ([session status] == MQTTSessionStatusClosed ) {

        NSString *clientID = [ApplicationManager sharedManager].clientID;
        NSString *sas = [ApplicationManager sharedManager].sasKey;
        NSString *hostURL = [ApplicationManager sharedManager].hostURL;

        [[MQTTSessionManager sharedManager] setHost:hostURL];
        [[MQTTSessionManager sharedManager] setClientID:clientID];

        [[MQTTSessionManager sharedManager] connectWithUserName:[NSString stringWithFormat:@"%@/%@",hostURL,clientID] andPassword:sas];
    }

}
jcavar commented 6 years ago

Relevant probably: https://github.com/facebook/SocketRocket/issues/532

jcavar commented 6 years ago

I believe this bug is fixed on master: https://github.com/novastone-media/MQTT-Client-Framework/blob/master/MQTTClient/MQTTClient/MQTTCFSocketTransport.m#L39-L46

We had to move to using queues instead of run loop which is probably better anyway. There are some bits to resolve before making official release, but feel free to try it out from master.

jcavar commented 6 years ago

Ok, 0.10.0 version is on CocoaPods and this issue should be fixed I believe. Please give it a try.

saberChen commented 6 years ago

Using MQTTClient 0.12.0, still many crashes like this for my app, [MQTTDecoder stream:handleEvent:] @jcavar

jcavar commented 6 years ago

Do you have any idea how it happens?

marciniwanicki commented 6 years ago

I think the crash is related to a synchronization issue in close method (MQTTDecoder.m:88) leading to a deadlock. I believe most of the clients experience it on the main thread as it's the default configuration.

 if (self.queue != dispatch_get_specific(&QueueIdentityKey)) {
        dispatch_sync(self.queue, ^{ // deadlock
            [self internalClose];
        });
    }

Existing implementation seems to relay on the assumption that two different queues cannot perform blocks on the same thread. This assumption is not correct as GCD may skip switching threads when it's applicable (for performance reasons).

Here is a test that should reproduce it.

#import <XCTest/XCTest.h>
#import "MQTTDecoder.h"

@interface MQTTDecoderTests : XCTestCase

@end

@implementation MQTTDecoderTests

- (void)testCloseFromNonMainDispachQueueOnMainThread {
    MQTTDecoder *sut = [[MQTTDecoder alloc] init];
    dispatch_queue_t customQueue = dispatch_queue_create("mqttdecodertests.queue", nil);

    dispatch_sync(customQueue, ^{
        [sut close];
    });

    XCTAssertTrue(YES); // test passed if this line is reached
}

@end

The fix is not trivial as the MQTTDecoder using externally available queue. My naive fix would at least make some cases working, and throw exceptions if the code cannot be performed synchronously in a safe way.

- (void)close {
    BOOL dispatchedOnDecoderQueue = self.queue == dispatch_get_specific(&QueueIdentityKey);
    BOOL calledOnMainThreadAndDecoderUsingMainQueue = [NSThread isMainThread] && self.queue == dispatch_get_main_queue();
    BOOL canSafelyInlineInternalClose = dispatchedOnDecoderQueue || calledOnMainThreadAndDecoderUsingMainQueue;
    if (!canSafelyInlineInternalClose) {
        @throw [NSException exceptionWithName:NSInternalInconsistencyException
                                       reason:@"Cannot close MQTTDecoder synchronously"
                                     userInfo:nil];
    }

    [self internalClose];
}

Maybe we could improve the fix by replacing the exception with a code niling the delegate and dispatching internalClose asynchronously? I am not sure if that would have any bad implications.

nnmanhit commented 6 years ago

This is still happen because QueueIdentityKey = NULL.

I have fixed by this way, and it works.

@interface MQTTDecoder() {
    void *QueueIdentityKey;
}

@property (nonatomic) NSMutableArray<NSInputStream *> *streams;
@property (readonly) void *QueueIdentityKey;

@end
- (void *)QueueIdentityKey {
    return QueueIdentityKey;
}
- (void)setQueue:(dispatch_queue_t)queue {
    _queue = queue;

    // We're going to use dispatch_queue_set_specific() to "mark" our queue.
    // The dispatch_queue_set_specific() and dispatch_get_specific() functions take a "void *key" parameter.
    // Later we can use dispatch_get_specific() to determine if we're executing on our queue.
    // From the documentation:
    //
    // > Keys are only compared as pointers and are never dereferenced.
    // > Thus, you can use a pointer to a static variable for a specific subsystem or
    // > any other value that allows you to identify the value uniquely.
    //
    // So we're just going to use the memory address of an ivar.

    QueueIdentityKey = &QueueIdentityKey;
    dispatch_queue_set_specific(_queue, QueueIdentityKey, QueueIdentityKey, NULL);
}
- (void)close {
    // https://github.com/novastone-media/MQTT-Client-Framework/issues/325
    dispatch_block_t block = ^{
        [self internalClose];
    };

    if (dispatch_get_specific(QueueIdentityKey))
        block();
    else
        dispatch_sync(_queue, block);
}
jcavar commented 5 years ago

We merged this a while ago I think. Is this issue still happening?

repomies commented 5 years ago

I’m sorry I can’t answer for myself, since our app’s backend infrastructure unfortunately no longer exists.

avichayVonage commented 5 years ago

This issue also happened in my application lately, using version 0.15.2

jcavar commented 5 years ago

Can you provide more information about how it happens (logs, conditions, steps)?

jcavar commented 4 years ago

https://github.com/novastone-media/MQTT-Client-Framework/issues/526

shashidj commented 4 years ago

Crashed: com.apple.main-thread 0 libsystem_kernel.dylib 0x1a5cc7db8 close + 8 1 CoreFoundation 0x1a5e6671c CFSocketInvalidate + 820 2 CFNetwork 0x1a92734ec _CFStreamErrorFromCFError + 158756 3 CoreFoundation 0x1a5e708cc _CFStreamClose + 108 4 MQTTClient 0x105c7e134 -[MQTTCFSocketDecoder close] + 60 5 MQTTClient 0x105c7f6d4 -[MQTTCFSocketTransport internalClose] + 244 6 MQTTClient 0x105c8c718 -[MQTTSession closeInternal] + 332 7 MQTTClient 0x105c91130 -[MQTTSession error:error:] + 224 8 MQTTClient 0x105c90e80 -[MQTTSession connectionError:] + 60 9 MQTTClient 0x105c7f880 -[MQTTCFSocketTransport encoder:didFailWithError:] + 104 10 MQTTClient 0x105c7e90c -[MQTTCFSocketEncoder stream:handleEvent:] + 508 11 MQTTClient 0x105c988f8 -[MQTTSSLSecurityPolicyEncoder stream:handleEvent:] + 124 12 CoreFoundation 0x1a5e746ac _signalEventSync + 216 13 CoreFoundation 0x1a5e747b8 _signalEventQueue_block_invoke + 28 14 libdispatch.dylib 0x1a5b7eec4 _dispatch_call_block_and_release + 32 15 libdispatch.dylib 0x1a5b8033c _dispatch_client_callout + 20 16 libdispatch.dylib 0x1a5b8c600 _dispatch_main_queue_callback_4CF + 832 17 CoreFoundation 0x1a5e5b41c CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE + 16 18 CoreFoundation 0x1a5e56034 CFRunLoopRun + 1708 19 CoreFoundation 0x1a5e55660 CFRunLoopRunSpecific + 480 20 GraphicsServices 0x1b0266604 GSEventRunModal + 164 21 UIKitCore 0x1aa02a15c UIApplicationMain + 1944 22 HubbleConnected 0x1028d0988 main + 14 (main.m:14) 23 libdyld.dylib 0x1a5cd11ec start + 4

Still we are getting this crash in iOS

pantaopt commented 4 years ago

I think should avoid doing some potentially risky things in the dealloc method.