matomo-org / matomo-sdk-android

SDK for Android to measure your apps with Matomo. Works on Android phones, tablets, Fire TV sticks, and more!
BSD 3-Clause "New" or "Revised" License
388 stars 162 forks source link

SDK crashes app after a while in case the tracking backend is permanently not reachable #199

Closed WillSarah closed 6 years ago

WillSarah commented 6 years ago

We are currently using the Piwik SDK in one client app. Last week the staging environment was undergoing maintenance and was not reachable for a few days / returning HTTP 500 error codes for every call made by the Piwik SDK.

It seems that the SDK then retries the calls without a retry limit until the app crashes with a SSLException "Unable to create application data", which seems to indicate a resource exhaustion due to too many threads/files etc. (At least according to this issue in the OkHttp repo). This behaviour was logged with the version 2.0.0 of the Piwik SDK integrated, but could also be reproduced with the 3.0.0 version of the Piwik SDK.

In the described case, the SDK always retries the calls without a retry limit, indicated by the endlessly repeating logs like this:

04-04 17:23:19.183 18907-18946/? D/PIWIK:EventCache: Switched state to ONLINE, uncached 0 events from disk.
04-04 17:23:19.184 18907-18946/? D/PIWIK:Dispatcher: Drained 1 events.
04-04 17:23:19.351 18907-18946/? D/PIWIK:DefaultPacketSender: status code 500
04-04 17:23:19.358 18907-18946/? D/PIWIK:Dispatcher: Unsuccesful assuming OFFLINE, requeuing events.
04-04 17:23:19.360 18907-18946/? D/PIWIK:Dispatcher: Dispatched 0 events.
04-04 17:23:19.364 18907-18946/? D/PIWIK:EventDiskCache: Cache check took 0ms
04-04 17:23:19.365 18907-18946/? D/PIWIK:EventDiskCache: Uncaching of 0 events took 0ms
04-04 17:23:19.365 18907-18946/? D/PIWIK:EventCache: Switched state to ONLINE, uncached 0 events from disk.

After a while multiple SocketExceptions appear in the logs:

04-04 13:44:47.387 30360-30405/? D/PIWIK:EventCache: Switched state to ONLINE, uncached 0 events from disk.
04-04 13:44:47.387 30360-30405/? D/PIWIK:Dispatcher: Drained 2 events.
04-04 13:44:47.389 30360-30405/? D/PIWIK:Dispatcher: java.net.SocketException: socket already closed
        at java.net.PlainSocketImpl.socketClose0(PlainSocketImpl.java:211)
        at java.net.AbstractPlainSocketImpl.socketClose(AbstractPlainSocketImpl.java:721)
        at java.net.AbstractPlainSocketImpl.close(AbstractPlainSocketImpl.java:546)
        at java.net.SocksSocketImpl.close(SocksSocketImpl.java:554)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:375)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:357)
        at java.net.Socket.connect(Socket.java:616)
        at com.android.okhttp.internal.Platform.connectSocket(Platform.java:145)
        at com.android.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:141)
        at com.android.okhttp.internal.io.RealConnection.connect(RealConnection.java:112)
        at com.android.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184)
        at com.android.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126)
        at com.android.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95)
        at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281)
        at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:461)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:127)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getOutputStream(HttpURLConnectionImpl.java:258)
        at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getOutputStream(DelegatingHttpsURLConnection.java:218)
        at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getOutputStream(Unknown Source:0)
        at org.piwik.sdk.dispatcher.Dispatcher.dispatch(Dispatcher.java:247)
        at org.piwik.sdk.dispatcher.Dispatcher$1.run(Dispatcher.java:167)
        at java.lang.Thread.run(Thread.java:764)
    Unsuccesful assuming OFFLINE, requeuing events.
    Dispatched 0 events.
04-04 13:44:47.391 30360-30405/? D/PIWIK:EventDiskCache: Cache check took 0ms
    Uncaching of 0 events took 0ms

After a few minutes, the app then crashes with the following exception, preceded by "AppData::create pipe (2) failed: Too many open files":

04-04 14:56:58.513 5634-5697/? E/NativeCrypto: AppData::create pipe(2) failed: Too many open files
04-04 14:56:58.532 5634-5697/? E/AndroidRuntime: FATAL EXCEPTION: Thread-5
    Process: ?, PID: 5634
    java.lang.RuntimeException: javax.net.ssl.SSLException: Unable to create application data
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.newSsl(ConscryptFileDescriptorSocket.java:161)
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.<init>(ConscryptFileDescriptorSocket.java:152)
        at com.android.org.conscrypt.OpenSSLSocketFactoryImpl.createSocket(OpenSSLSocketFactoryImpl.java:149)
        at com.android.okhttp.internal.io.RealConnection.connectTls(RealConnection.java:181)
        at com.android.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:149)
        at com.android.okhttp.internal.io.RealConnection.connect(RealConnection.java:112)
        at com.android.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184)
        at com.android.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126)
        at com.android.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95)
        at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281)
        at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:461)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:127)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getOutputStream(HttpURLConnectionImpl.java:258)
        at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getOutputStream(DelegatingHttpsURLConnection.java:218)
        at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getOutputStream(Unknown Source:0)
        at org.piwik.sdk.dispatcher.Dispatcher.dispatch(Dispatcher.java:247)
        at org.piwik.sdk.dispatcher.Dispatcher$1.run(Dispatcher.java:167)
        at java.lang.Thread.run(Thread.java:764)
     Caused by: javax.net.ssl.SSLException: Unable to create application data
        at com.android.org.conscrypt.NativeCrypto.SSL_new(Native Method)
        at com.android.org.conscrypt.SslWrapper.newInstance(SslWrapper.java:58)
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.newSsl(ConscryptFileDescriptorSocket.java:159)
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.<init>(ConscryptFileDescriptorSocket.java:152) 
        at com.android.org.conscrypt.OpenSSLSocketFactoryImpl.createSocket(OpenSSLSocketFactoryImpl.java:149) 
        at com.android.okhttp.internal.io.RealConnection.connectTls(RealConnection.java:181) 
        at com.android.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:149) 
        at com.android.okhttp.internal.io.RealConnection.connect(RealConnection.java:112) 
        at com.android.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184) 
        at com.android.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) 
        at com.android.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) 
        at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281) 
        at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) 
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:461) 
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:127) 
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getOutputStream(HttpURLConnectionImpl.java:258) 
        at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getOutputStream(DelegatingHttpsURLConnection.java:218) 
        at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getOutputStream(Unknown Source:0) 
        at org.piwik.sdk.dispatcher.Dispatcher.dispatch(Dispatcher.java:247) 
        at org.piwik.sdk.dispatcher.Dispatcher$1.run(Dispatcher.java:167) 
        at java.lang.Thread.run(Thread.java:764) 
d4rken commented 6 years ago

I assume you are not using the GZIP option, right? As the v2.0.0 log points to https://github.com/matomo-org/piwik-sdk-android/blob/v2.0.0/piwik-sdk/src/main/java/org/piwik/sdk/dispatcher/Dispatcher.java#L247

The obvious cause would be that we forget to clean up our resources, but I can't see that in the code. Both the output stream and the urlconnection get closed in the finally statements.

On what Android versions is this reproducible?

Do you have a minimal example that I can use to reproduce this?

d4rken commented 6 years ago

I've expanded the demo app to spam 2000 dispatches against a server returning code 301 but I can't get it to crash or show the logs you posted. Tired on API26 and API16.

The only noteworthy thing I see is

E/StrictMode: A resource was acquired at attached stack trace but never released. See java.io.Closeable for information on avoiding resource leaks.
                                                          java.lang.Throwable: Explicit termination method 'end' not called
                                                              at dalvik.system.CloseGuard.open(CloseGuard.java:184)
                                                              at java.util.zip.Inflater.<init>(Inflater.java:82)
                                                              at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:96)
                                                              at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
                                                              at libcore.net.http.HttpEngine.initContentStream(HttpEngine.java:528)
                                                              at libcore.net.http.HttpEngine.readResponse(HttpEngine.java:836)
                                                              at libcore.net.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:274)
                                                              at libcore.net.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:486)
                                                              at org.piwik.sdk.dispatcher.DefaultPacketSender.send(DefaultPacketSender.java:57)
                                                              at org.piwik.sdk.dispatcher.DefaultDispatcher$1.run(DefaultDispatcher.java:179)
                                                              at java.lang.Thread.run(Thread.java:856)

I'm looking into this currently.

WillSarah commented 6 years ago

Correct, not using the GZIP option as of now. Might be worth pointing out that we also set the dispatch interval to 0 as per the customers request (via tracker.setDispatchInterval()).

We saw this on a Pixel (1st generation) with 8.1.0 and a Nexus 6P with 8.1.0 but did not actively test the scenario on other devices at the time.

Currently we don't have a minimal example available, if I find the time to create one I'll add it here.

d4rken commented 6 years ago

The strictmode violation in https://github.com/matomo-org/piwik-sdk-android/issues/199#issuecomment-380084111 is caused by the error stream not being consumed.

If the server returns a non OK (200er) code, then the response can contain an error stream, which apparently should be consumed, or at least closed, even if not used.

I'll submit a few change to the dev branch. Try it, if it still occurs then I'll need a way to reproduce this.

d4rken commented 6 years ago

@WillSarah Merged #200 into dev ⬅️

If this fixes it, a 3.0.1 release would be warranted.

WillSarah commented 6 years ago

@d4rken - It seems that this indeed fixes the crashes that occured before. I tried reproducing the issue with Charles (as the backend is working now) with version 3.0.0 and did get some crashes related to the "too many files open" scenario (like "java.lang.RuntimeException: Could not read input channel file descriptors from parcel") that are not occuring with the version from dev.

Also I will try to use a dispatchTimeInterval > 0, as using 0 currently means that there is no sleep time between call retries (DefaultDispatcher:162), which leads to a ton of calls in a short time.

Thumbs up for a 3.0.1 version and thanks alot!

d4rken commented 6 years ago

https://github.com/matomo-org/piwik-sdk-android/releases/tag/v3.0.1

@WillSarah ☝️