ukhsa-collaboration / COVID-19-app-iOS-BETA

Source code of the Beta of the NHS COVID-19 iOS app
https://covid19.nhs.uk/
MIT License
800 stars 179 forks source link

Android bluetooth irrecoverable failure on too many open connections #30

Closed freddychoi closed 4 years ago

freddychoi commented 4 years ago

Describe the bug Bluetooth capability on Android device acting as peripheral enters irrecoverable failure state after circa 700 - 1000 connect requests without disconnect from central. Bluetooth on Android may still detect connection requests but stops accepting connection, service/characteristic discovery and characteristic read/write. There is no warning from Android. The device needs to be rebooted to recover. In the COVID-19 app, this means the Android device (peripheral) will appear operational but undetectable by centrals.

To Reproduce Steps to reproduce the behaviour:

  1. Set one Android device (tested on Pixel 2) as peripheral, advertising a service and a readable or writable characteristic.
  2. Set another device (Android / iOS, tested on Samsung J6 Android) as central, scanning for the service, making a connection to the peripheral and writing or reading the characteristic.
  3. Repeat connect, read/write characteristic in a loop, without calling disconnect in each cycle.
  4. Peripheral will eventually stop accepting connections, or even if it accepts connections, read/write characteristic will fail.
  5. If step 3 is changed to include a disconnect after read/write characteristic, the cycle can run indefinitely and bluetooth will remain functional.

Expected behaviour Peripheral should accept connect, read/write characteristics requests indefinitely, and remain robust against missing disconnect calls from centrals. This is a device / Android OS issue, but it has a direct impact on the COVID-19 app that relies on open connections and notifications for keep alive. It will mean some Android devices carried by people in high population areas will silently fail without warning, requiring device reboot.

edent commented 4 years ago

Hi @freddychoi It sounds similar to behaviour that we've seen on some older phones. We've observed older Android phones becoming unable to establish new connections after some time - and it does seem related to the number of connections it has made. This is in spite of the fact that we are explicitly disconnecting. However, the centrals on other devices are still able to connect to its peripheral.

What I would like to clarify:

freddychoi commented 4 years ago

Hi Terence, thanks for taking the time to look at this. There is a way round this problem (see Mitigation below). Here are answers to your questions ...

What versions of Android are the phones running? The devices under test are:

Does the same error occur if they swap the phones? That is, using the J6 as a peripheral and the Pixel as a central? J6 can't act as a peripheral as it can only receive, but the same error does occur on both the Pixel (Android 8) and S3 (Android 9). See Trials 1, 2, 3 and 6 below.

Once the error is observed, if a third device is introduced, how does it interact with the existing devices? Can it see/be seen? Once the error occurs on the peripheral, it stops interacting with all centrals. The error is cleared by rebooting the peripheral. Bluetooth off / on is not enough. See Trials 4 and 5 below.

Trials and results are as follows:

Mitigation The observation is that on Android devices, the following improves stability:

On the iOS side, the following has been observed in background mode:

Given the above, the current keep alive code is only required for iOS - iOS detection when both devices are in background mode, because iOS can detect Android on every scanForPeripheral call even in background / suspended state, which also keeps iOS awake to complete another scan. As such, Android - Android and also iOS - Android detection can primarily rely on scans, and connect is only required initially to obtain the beacon code / broadcast payload on first contact or after data expiry.

As a quick experiment in your code, disable the keepAlive characteristic in your Android code (GattServer.kt) and add central.scanForPeripheral after sendKeepAlive in the handler in readRSSIAndSendKeepalive in your iOS code (BTLEListener.swift). The Android device should be discovered by scan even when iOS is in background / suspended states.

Please find attached my equivalent sonar / beacon code that uses the above scheme to avoid Android connects. I haven't uploaded the complete project to GitHub yet, as I'm still making changes to other parts of the code, but the beacon code is entirely self-contained. I have a BGAppRefreshTask in AppDelegate to call Receiver.scan() at regular intervals as a backup to keep iOS awake. Please feel free to use the code without any restriction or cost for NHSX dev. This is for decentralised on-device contact matching, but the basic idea of BLE beacon detection is the same, and feel free to use this to avoid relying on Google / Apple API.

Beacon detection time intervals (in seconds) are as follows ... iOS-iOS background count=809,mean=16.0,sd=1.3,min=8.0,max=16.5,testPeriod=12973.5 Android-iOS background count=162,mean=35.8,sd=33.78,min=0.1,max=247.2,period=5798.6 The sampling rates are good enough for contact tracing. The sample size is small for Android because BLE peripheral UUID is changed by Android regularly and the statistics is collected on a per UUID basis.

The starting point is "Transceiver.swift". Given a shared secret (e.g. "sharedSecret1".data(using: .utf8) will do for testing), the code will generate cryptographically derived beacon codes for transmission (Transmitter.swift), and the receiver will detect these and report them to ReceiverDelegates (register with Receiver.delegates.append). The beacon codes can be regenerated for on-device matching given the daily beacon code seed (see doc in BeaconCodes.swift). This will integrate well with your existing solution, as given your shared secret, this will generate all the day/beacon codes for recording and on-device matching. I am happy to provide the server-side / on-device matching code for unrestricted use by NHSX.

Best of luck with this work.

Beacon-20200520-1758.zip

c19x commented 4 years ago

Hi Terence, I've finished testing all the crypto and beacon code. It addresses this Android issue as well as the previous issue I raised about non-transmitting Android devices. The solution works well. The complete iOS app for decentralised contact tracing is now available on GitHub https://github.com/c19x. Just clone in Xcode and run on two iPhones. No setup nor registration required. The app is currently set to communicate with a test server on AWS. The server source code will also be made available on GitHub soon. Please feel free to take and integrate any aspect of the code in your work. Good luck and hope your work is going well. Fred

edent commented 4 years ago

I'm pasting this message in every active GitHub issue, so you may receive duplicate notifications.

Today, I'm happy to announce that NHSX has released the full git commit history for the Isle of Wight Beta apps.

As discussed, we have redacted API keys, sensitive domain names, and some of the developers' personal details. I am still waiting on final approval to publish the server-side code.

I would like to personally thank the community for your comments, bug reports, and vulnerability disclosures. They all went into helping the development process.

The beta trial of this app has now ended and we've moved to the next phase of app development. It is our intention to publish the source code of future apps as the binaries are released to the public.

Once again, thank you for being part of this.

Terence Eden Head of Open Technology - NHSX