particle-iot / particle-cloud-sdk-ios

Official Particle Cloud SDK for iOS
Apache License 2.0
19 stars 21 forks source link

EXC_BAD_ACCESS (code=1, address=0x54) #7

Closed skyfoxa closed 5 years ago

skyfoxa commented 6 years ago

Environment info:

Issue description:

The Particle SDK is crashing always in EventSource.m when this function is called: - (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error. The crash is happening on line 176 when this part of code is executed:

dispatch_after(popTime, self.queue, ^(void) {
        if (self.retries < 5) {
//            NSLog(@"connection retries %d",self.retries);
            self.retries++;
            [self open];
        }

    });

It crashes with Thread 10: EXC_BAD_ACCESS (code=1, address=0x54) error when iOS is trying to connect to a device

skyfoxa commented 6 years ago

Ok, I've found there was newer version 0.8.1, but the update didn't help... I'd recommend better documentation in readme.md for actual version.

The issue is when dispatch_after(popTime, self.queue, ^(void) {} is executed the self.queue is nil, that causes the BAD_ACCESS error. I'm not sure why this is happening.

RaimundasSakalauskas commented 6 years ago

Thanks, will investigate. The latest SDK version is displayed right below the header.

RaimundasSakalauskas commented 6 years ago

Can you please verify this is happening with non-beta xcode & non-beta iOS versions?

skyfoxa commented 6 years ago

@RaimundasSakalauskas sorry I can't as I don't have any other device with iOS 11 and my non-beta xcode 9.4.1 won't let me install and run any application on iOS 12.

Do you have any suggestion what is causing the error?

RaimundasSakalauskas commented 6 years ago

@skyfoxa thank you for you answer. Apparently queue for some reason gets deallocated, but it's hard for me to tell why without further investigation. Can you please let me what is the reliable way to trigger the error you are having? I will try it on iOS 11. I'd assume that eventsource is deallocated without canceling some long running code. By the time scheduled call is triggered, queue is deallocated and causes a crash. Most likely the solution will be 1 of 2:

  1. Making sure all long running code is canceled before deallocating eventsteam
  2. Wrapping the crashing code with nil check

Knowing the steps you do to trigger the crash will help me a lot with fixing this.

skyfoxa commented 6 years ago

@RaimundasSakalauskas here is all thread stack info:

* thread #12, queue = 'com.apple.root.background-qos', stop reason = EXC_BAD_ACCESS (code=1, address=0x54)
  * frame #0: 0x00000001086b3bac libdispatch.dylib`dispatch_after + 292
    frame #1: 0x000000010676d318 Particle_SDK`-[EventSource connection:didFailWithError:](self=0x0000000281525020, _cmd="connection:didFailWithError:", connection=0x00000002833141d0, error=domain: @"NSURLErrorDomain" - code: 18446744073709550611) at EventSource.m:168
    frame #2: 0x00000001a2731a6c CFNetwork`__65-[NSURLConnectionInternal _withConnectionAndDelegate:onlyActive:]_block_invoke + 60
    frame #3: 0x00000001a24a2cd8 CFNetwork`-[NSURLConnectionInternal _withConnectionAndDelegate:onlyActive:] + 200
    frame #4: 0x00000001a24a77e4 CFNetwork`-[NSURLConnectionInternal _withConnectionAndDelegate:] + 56
    frame #5: 0x00000001a26ee398 CFNetwork`_NSURLConnectionDidFail(_CFURLConnection*, __CFError*, void const*) + 84
    frame #6: 0x00000001a2631690 CFNetwork`___ZN27URLConnectionClient_Classic17_delegate_didFailEP9__CFErrorU13block_pointerFvvE_block_invoke + 116
    frame #7: 0x00000001a262f228 CFNetwork`___ZN27URLConnectionClient_Classic18_withDelegateAsyncEPKcU13block_pointerFvP16_CFURLConnectionPK33CFURLConnectionClientCurrent_VMaxE_block_invoke_2 + 108
    frame #8: 0x000000010869cde4 libdispatch.dylib`_dispatch_client_callout + 16
    frame #9: 0x00000001086a0a2c libdispatch.dylib`_dispatch_block_invoke_direct + 232
    frame #10: 0x00000001a270e15c CFNetwork`RunloopBlockContext::_invoke_block(void const*, void*) + 36
    frame #11: 0x00000001a1e29b2c CoreFoundation`CFArrayApplyFunction + 80
    frame #12: 0x00000001a270e010 CFNetwork`RunloopBlockContext::perform() + 128
    frame #13: 0x00000001a270f1e4 CFNetwork`MultiplexerSource::perform() + 312
    frame #14: 0x00000001a270ef44 CFNetwork`MultiplexerSource::_perform(void*) + 60
    frame #15: 0x00000001a1e8a5b8 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
    frame #16: 0x00000001a1e8a538 CoreFoundation`__CFRunLoopDoSource0 + 88
    frame #17: 0x00000001a1e89e1c CoreFoundation`__CFRunLoopDoSources0 + 176
    frame #18: 0x00000001a1e84ce8 CoreFoundation`__CFRunLoopRun + 1040
    frame #19: 0x00000001a1e845b8 CoreFoundation`CFRunLoopRunSpecific + 436
    frame #20: 0x00000001a1e85354 CoreFoundation`CFRunLoopRun + 80
    frame #21: 0x000000010676c8b8 Particle_SDK`-[EventSource open](self=0x0000000281525020, _cmd="open") at EventSource.m:116
    frame #22: 0x000000010676c12c Particle_SDK`__49-[EventSource initWithURL:timeoutInterval:queue:]_block_invoke(.block_descriptor=0x0000000283fb0c90) at EventSource.m:66
    frame #23: 0x000000010869cde4 libdispatch.dylib`_dispatch_client_callout + 16
    frame #24: 0x00000001086a01e0 libdispatch.dylib`_dispatch_continuation_pop + 528
    frame #25: 0x00000001086b2cac libdispatch.dylib`_dispatch_source_invoke + 1864
    frame #26: 0x00000001086ae1f8 libdispatch.dylib`_dispatch_root_queue_drain + 376
    frame #27: 0x00000001086aebe4 libdispatch.dylib`_dispatch_worker_thread2 + 156
    frame #28: 0x00000001a1b16190 libsystem_pthread.dylib`_pthread_wqthread + 472
    frame #29: 0x00000001a1b18d00 libsystem_pthread.dylib`start_wqthread + 4
snimka obrazovky 2018-08-27 o 11 18 16

Maybe I discovered the cause. The whole process starts by calling ParticleCloud.sharedInstance().injectSessionAccessToken(accessToken, withExpiryDate: expiry), when the EventSource with initWithURL is called at the beginning. But injectSessionAccessToken calls [self logout]; that leads to [EventSource close] function. There is self.queue = nil; that deallocates the queue. I commented this row and the BAD_ACCESS stopped.

skyfoxa commented 6 years ago

so it passed withou crashing only in 1 try and in all other tries it crashed again because of nil queue. Is there another place where this queue could be set to nil except [EventSource close] function?

RaimundasSakalauskas commented 6 years ago

Thanks for clarifying. Investigating now. Will keep this thread updated.

RaimundasSakalauskas commented 6 years ago

Can you please reach me at raimis@particle.io? I'm unable to reproduce this neither on iOS 12 simulator, nor on iOS 11 even when I explicitly set queue to nil throughout the project. Maybe we could arrange screenshare or cooperate otherwise trying to solve this.

Tried to simply inject the session, tried to trigger login and inject session in complete handler / before complete handler returns - all scenarios successfully execute.

skyfoxa commented 6 years ago

@RaimundasSakalauskas thanks for investigating on this.

Meantime I tried these steps:

  1. I updated to iOS 12 Beta 11 and xCode 10 beta 6
  2. I tried the hack to comment out the self.queue = nil; in [EventSource close] again.
  3. Clean build folder - important step
  4. Build

It didn't crash in any try and everything works as expected. I can't find any other place from where this close function or dealloc for EventSource could be called right now as I didn't write that code, so I'm still discovering what is going on. So I think I'll test the app with this hack for a while.

Do you think you could remove this self.queue = nil; line in next version?

RaimundasSakalauskas commented 6 years ago

@skyfoxa I assume that you can't share the project source, but maybe you could share the part of the code that interacts with Particle SDK? I would really like to understand what is happening there, because so far I was unable to reproduce the crash no matter what I try.

This is pure guessing at his point, but given that this part of code hasn't been changed for quite a while now (2 years?) and this is the first time it is reported, I would like to make sure there is nothing wrong with the way you use SDK.

skyfoxa commented 6 years ago

@RaimundasSakalauskas yeah, I don't have the permission to share the project nor the source code. But I understand your situation and I'll discuss it with my superior colleague.

I think it is definitely caused by new iOS 12 or new xCode 10, as you've said this wasn't changed for quite a while - we have been using deprecated SparkSDK for a year until now. Firstly I thought the issue was caused by my refactoring of some controllers (I didn't refactor the communication layer), but when the issue happened I tested previous branch with SparkSDK and iOS 12/xCode 10 and the issue happened too (the queue was set to nil). This is why I think something got change in new iOS 12/xCode 10 causing the issue.

RaimundasSakalauskas commented 6 years ago

I have installed iOS 12 beta 12 on iPhone 6s and used XCode 10 beta 2. Simply running [[ParticleCloud sharedInstance] injectSessionAccessToken:@"some_access_token" withExpiryDate:[[NSDate new] dateByAddingTimeInterval:1000]]; does not produce crash. Tried both swift and obj-c projects. Also tried XCode 10 beta 6 with same results. It would be very helpful to have an access to a code that causes a crash. It would even be better to have an access to an empty project containing only the code that causes the crash.

RaimundasSakalauskas commented 5 years ago

After seeing this crash in crashlytics I tried to dig deeper. Was able to reproduce if I ran inectAccessToken + test device was offline (no mobile data or wifi connection). I noticed that one EventStream object would first be closed then callback on that eventource would trigger which would cause a crash (wonder what changed in ios12 for this to start happening). I was able to prevent the crash by adding extra checks for that. Please see commit: https://github.com/particle-iot/particle-sdk-ios/commit/e1473b746b89873fbee269e9b090b181071ae357

skyfoxa commented 5 years ago

Hey @RaimundasSakalauskas, thanks for the update.

Your use case is similar to our functionality, the crash occurs when user has to change wifi network in the device, so for a while it is offline.

However it looks like your update isn't enough. In didFailWithError we are having issue with

dispatch_after(popTime, self.queue, ^(void) {
        if (wasClosed) {
            return;
        }

        if (self.retries < 5) {
            self.retries++;
            [self open];
        }

    });

self.queue still looks to be nil.

if it is nil and in [EventSource close] the wasClosed is set to TRUE, why the first if wasClosed doesn't work? Is it possible something did change with queue during that 10 lines of code in didFailWithError ?

RaimundasSakalauskas commented 5 years ago

There are multiple event sources running simultaneously and this code executes on different threads. Can you please confirm that you are still seeing this error even with the latest commit on the mesh branch (https://github.com/particle-iot/particle-sdk-ios/tree/feature/mesh)? I think I added few more changes after I replied the last time therefore the linked commit might be outdated.

We have 0 issues regarding this with our Tinker app after these changes were introduced and your reply leaves me slightly puzzled.

I'm sure you know how to use development pod (not the public release), but just in case someone who reads this thread does not here's the instruction:

  1. Clone this repo
  2. Check out mesh branch
  3. Define path to local git repo in your project file like this: pod 'Particle-SDK', :path => '../particle-sdk-ios'
  4. Run git install (if it's a new project) or pod update Particle-SDK if the dependencies have previously been installed and you are only swapping public pod for a development pod.
williamhqs commented 5 years ago

@skyfoxa How did you solve it? I got the same error under IOS 12.1. Thank you very much! Also in connectionDidFinishLoading method. I also commented self.queue = nil not sure if it helps, will test it.

RaimundasSakalauskas commented 5 years ago

@williamhqs did you try the instructions posted in my post above?

williamhqs commented 5 years ago

@skyfoxa yes, I removed self.queue = nil not sure if it works. I will let someone test first. Thanks for your quick response.

markoImake commented 5 years ago

I was experiencing the same issue on the master branch. I tried removing self.queue = nil as suggested but it did not fix the problem. I then tried the mesh branch as per @RaimundasSakalauskas suggestion and it has fixed the issue. Thanks!

RaimundasSakalauskas commented 5 years ago

I have just published SDK update. Both dependency managers should download the code commit that has a permanent fix for this issue now, so workaround is no longer needed. I will close this thread, but feel free to reopen if for some reason you will still experience this bug with SDK v0.9.0

skyfoxa commented 5 years ago

@RaimundasSakalauskas we updated on v0.9.2 and everything works fine now. Thanks for the update.