theheraldproject / herald-for-android

Herald for Android - Reliable mobile Bluetooth communications - Android library & test app

https://heraldprox.io

Apache License 2.0

27 stars 19 forks source link

Duplicate GATT service registrations #107

Closed jimmo closed 3 years ago

jimmo commented 3 years ago

Please see https://github.com/AU-COVIDSafe/mobile-android/issues/33

I don't think this is necessarily happening in Herald, but raising here FYI as you might want to implement some sort of guard in SensorArray or ConcreteBLETransmitter to prevent this from happening.

adamfowleruk commented 3 years ago

Thanks Jimmo. Confirmed this is not an issue in Herald.

Android line to clear services before adding: https://github.com/vmware/herald-for-android/blob/develop/herald/src/main/java/com/vmware/herald/sensor/ble/ConcreteBLETransmitter.java#L537

iOS line to clear services before adding: https://github.com/vmware/herald-for-ios/blob/develop/herald/herald/Sensor/BLE/BLETransmitter.swift#L153

jimmo commented 3 years ago

I understand that the code does call clearServices() but did you actually test it?

I also see this issue with the Herald test app -- just doing a direct ./gradlew build followed by ./gradlew installDebug if I start/stop the bluetooth adaptor. After stop/starting the Bluetooth adaptor I see multiple service registrations and advertising on multiple RPAs.

Looking more closely, in ConcreteBLETransmitter.java, the bleTimer is essentially polling the adaptor state, as it appears that BluetoothStateManagerDelegate isn't used -- the overridden start/stop methods do nothing. When the polling notices the adaptor is powered off, it immediately null's bluetoothGattServer without clearing services or close()ing it.

adamfowleruk commented 3 years ago

Re-opening to re-check the specific use case of deliberately turning BLe on/off/on.

We have tested this code a significant number of times, continually, and run full suites every week of manual testing. It is always possible in real world code that a regression can slip in. I would remind you that this is a network system whose behaviour on individual phones - despite manufacturers and OS API documentation's insistence to the contrary - varies widely across many devices and OS versions. It's also worth noting that our code and protocol is completely open and so you are able to test ours - not always the case for DCT protocols / subsystems.

@jimmo "did you actually test it?" is rather an unprofessional comment, shows an immaturity in correspondence with your peers, is unkind and unwarranted. I have been very open with you on here and will always continue to do so. We've been working on this for months including in our limited spare time and do so to help all DCT teams to further our common understanding of Bluetooth performance, vagaries of devices, and to help improve human health outcomes globally. Please consider your language in future reports, and please re-read our code of conduct before reporting a new issue.

adamfowleruk commented 3 years ago

Hello Jimmo. From our testing using the nRF Connect app we cannot reproduce the issue. Can you please provide a screenshot of your output similar to the below?

We believe the issue may be in how nRF Connect displays both past and current service information for devices rather than advertising multiple services. We use BLE Sniffers for diagnostics here rather than the nRF Connect app. UberTooth are great, but the Hollong Viewtool is a reasonable solution too. They show accurately what's going on 'over the air', alongside Wireshark.

Here's what we've found:-

HERALD starts and advertises under address A
nRF Connect starts scan, finds A, presents it on the list, at this point one can connect to A to interrogate services, let’s leave it scanning ...
HERALD automatically refreshes advert every 15 minutes, upon refresh device will advertise under new address B
nRF Connect is still scanning, it finds B, presents it on the list (in addition to A), at this point one can connect to B to interrogate services, but A is no longer valid, i.e. it won’t accept connections anymore, but nRF Connect has cached the information for viewing
Try switching bluetooth OFF and ON again, this will cause HERALD to stop and start advertising, under new address C
As before, nRF Connect is still scanning, so it finds C, presents it on the list, along with A and B, however only C is valid for connection, A and B are no longer valid.

nRF Connect lists and caches everything it finds and doesn’t clear the list when the advert becomes invalid. This is by design to enable review and debugging. For information, iOS CoreBluetooth also caches data about devices for efficiency. Seeing multiple adverts on nRF Connect doesn’t mean HERALD is advertising on multiple addresses, or services are not being cleared. The HERALD log (in particular on Android) provides a definitive list of all current devices at a point in time.

As an example, we have a screenshot of the test just conducted … nrfimage

This was generated by:-

HERALD on Android start
nRF Connect running on iPhone, start scan, finds HERALD (manufacturer data 0xA3-6C), I connected to it to interrogate services
Bluetooth OFF on Android, Bluetooth ON on Android, triggering HERALD on Android to restart advert
nRF Connect on iPhone finds new advert (manufacturer data 0xD4-89), I can connect to this to interrogate services
The first instance of HERALD is no longer valid as an advert, you can view the cached data, but connect is no longer possible (see “Connecting” state which will never succeed)

In summary, we believe there is no bug in the current Herald code here. Please provide more information to enable us to reproduce if you believe we are incorrect. As you can imagine these things take a while to confirm, so having a confirmed issue with reproducible data is key.

I'll keep the issue open for a couple of days to give you time to collate and submit evidence just in case.

Thanks.

HendX commented 3 years ago

@jimmo "did you actually test it?" is rather an unprofessional comment, shows an immaturity in correspondence with your peers, is unkind and unwarranted.

This is an unfair characterisation. I can see that it could be interpreted sarcastically, however, this is my understanding of the above issue:

@jimmo reported this issue
Issue almost immediately closed, referencing a line in the codebase that calls clearServices()
The COVIDSafe project already has this line of code when this bug was detected.

Asking if it was subsequently tested after the issue was reported was a fair question, given how quickly it was dismissed as invalid and wontfix.

jimmo commented 3 years ago

nRF Connect running on iPhone

@adamfowleruk Please use nRF Connect on Android -- for the reasons you've outlined above regarding CoreBluetooth limitations, nRF Connect on iPhone isn't very useful for any sort of low-level debugging.

I have repro'ed this in bluez on Linux and with the BlueKitchen stack (normally used on embedded devices but in this case talking to a HCI controller over UART from Linux).

Here are some additional screenshots showing it in nRF Connect for Android. You can see three different RPAs all currently sending beacons (I cleared the scan results immediately before taking this screenshot). On the next screenshot you can see I am concurrently connected to all three (in three tabs), and the three identical services being returned (it's the same in all three tabs).

In summary, we believe there is no bug in the current Herald code here. Please provide more information to enable us to reproduce if you believe we are incorrect.

See above "When the polling notices the adaptor is powered off, it immediately null's bluetoothGattServer without clearing services or close()ing it."

I'm also not entirely convinced that clearServices itself always works but haven't investigated in detail.

jimmo commented 3 years ago

Please also see https://github.com/AU-COVIDSafe/mobile-android/issues/33#issuecomment-745038577 for details of another repro.

adamfowleruk commented 3 years ago

Hi @jimmo @HendX - again our testing and regular Bluetooth scans don't show this issue. I've even just redone the tests against the latest code today in case I'm going mad in my old age, and still cannot reproduce the issue, hence closing it quickly before. We perform regular checks and scans and have never seen the issue you mention.

I would advise you to try a raw Bluetooth scanner at this point. I use dedicated devices and software rather than my linux box here just to be safe on my test results.

My testing certainly isn't showing an issue with the Herald code on our develop branch. Attached is a wireshark capture of an ios transmitting device. No duplicate services shown on restarting BLe with the app running. I believe this is the scenario you are alluding to @jimmo, correct? I want to be sure because we cannot replicate it.

Capture of an iOS Herald demo app transmitter:- ios-capture-ble-on-off-on-off-on-no-suplicate-services.pcapng.zip

Also a capture from an android transmitter:- android-capture-ble-on-off-on-off-on-no-duplicate-services.pcapng.zip

We've also tested it on Android by looking at the raw scan results on the phones themselves.

Testing with Google Pixel 2 (Android 29) and Samsung A10 (Android 28)
- Pixel 2 acting as peripheral where Bluetooth will be switched ON and OFF to restart advert on new addresses
- A10 acting as central scanning at regular intervals and reporting discovered adverts
- Raw scan results reported by Android scan on A10 being reviewed to confirm if Pixel 2 is advertising duplicate services following Bluetooth OFF/ON
Start test
- Start A10, Bluetooth ON, scan reports no result
- Start Pixel 2, Bluetooth ON
- A10 reports address A (59:44:24:42:A8:19) is advertising service X (cc0ac8b7-03b5-4252-8d84-44d199e16065)

2020-12-15 08:18:11.913 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=59:44:24:42:A8:19, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[9, 109, -74, -59, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-59, timestampNanos=252009707856595, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff096db6c5000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000) 2020-12-15 08:18:12.513 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=59:44:24:42:A8:19, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[9, 109, -74, -59, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-60, timestampNanos=252010300882402, eventType=27, primaryPhy=1, secondaryPhy=0,

Bluetooth OFF / ON
- Pixel 2, Bluetooth OFF
- A10 scans stop reporting adverts from address A
- Pixel 2, Bluetooth ON
- A10 reports address B (5E:95:C9:DE:90:C1) is advertising service X (cc0ac8b7-03b5-4252-8d84-44d199e16065)
- A10 stops reporting adverts from address A
- This confirms Pixel 2 address has switched from A to B following Bluetooth OFF / ON
- This also confirms original advert on address A is no longer being advertised as this is the raw unfiltered Android scan results

2020-12-15 08:18:28.241 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=5E:95:C9:DE:90:C1, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[125, 110, 113, -30, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-58, timestampNanos=252026020657506, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff7d6e71e2000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000) 2020-12-15 08:18:29.941 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=5E:95:C9:DE:90:C1, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[125, 110, 113, -30, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-58, timestampNanos=252027699573928, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff7d6e71e2000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000)

Bluetooth OFF / ON and popular scanning tools (e.g. nRF Connect on iOS and Android, BlueSee on macOS, LightBlue on iOS)
- Pixel 2, Bluetooth OFF, A10 scans stop reporting adverts from address B
- Pixel 2, Bluetooth ON, A10 reports address C (56:19:EF:B1:73:E6) is advertising service X (cc0ac8b7-03b5-4252-8d84-44d199e16065)
- A10 stops reporting adverts from address B (or A), this confirms Pixel 2 address has switched from B to C following Bluetooth OFF / ON

2020-12-15 08:18:47.250 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=56:19:EF:B1:73:E6, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[120, -61, 98, 20, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-58, timestampNanos=252045048708454, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff78c36214000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000) 2020-12-15 08:18:47.492 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=56:19:EF:B1:73:E6, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[120, -61, 98, 20, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-57, timestampNanos=252045292728570, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff78c36214000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000)

- Scanning tools show history of all adverts, so both B and C are visible
- Connect to C shows service details, connect to B shows cached data and “connecting” on nRF Connect
- Repeated Bluetooth OFF / ON multiple times on Pixel 2, scanning tools show what appears to be duplicate adverts but they are just cached data, connect will only succeed on most recent advert
- nRF Connect on iOS shows cached data even after app restart because CoreBluetooth reports cached adverts for efficiency, connect will only succeed on one of the adverts

At this point it appears the problem is with your test methodology and not the Herald code. If you can please be extremely specific about the exact test scenario, test against the develop branch of Herald on all devices, and let us know the device model and OS versions that would be extremely useful. We may well have the same device here to test. Also please test our develop branch code just to be certain.

At the moment though we may well have to close this as cannot reproduce/wontfix because our testing has never exhibited this behaviour. We do regular testing, and a full suite across all our phones for every major release (our majors are dot releases. i.e. v1.0, v1.1 etc.), including with Bluetooth scanners to verify what goes over the network in our advertisements. This is why we were able to close your report so quickly the first time.

Thanks for your efforts. Like I said before, I'll keep this open another day in case you find evidence I can replicate on the Herald demo app on the develop branch, but I really can't replicate your issue here with Herald currently.

adamfowleruk commented 3 years ago

I've created a new label that's less aggressive than the 'invalid' 'wontfix' pairing. ❤️

jimmo commented 3 years ago

Attached is a wireshark capture of an ios transmitting device.

"Transmitter" is the peripheral/broadcaster role, right? This bug is specifically for the case of an Android peripheral/broadcaster.

Also a capture from an android transmitter:

I looked at this capture and do not see anything other than advertising and scan response payloads. I do agree that there are no concurrent RPAs (but see below).

However, the main issue that I'm raising here is that the services are registered multiple times. I think perhaps you might be confusing the advertised services (i.e. the (IN)COMPLETE_SERVICE_UUID field in the advertising payload) with the actual services being hosted by the GATT server (i.e. what you see in a service discovery).

I don't see anywhere in your nRF Connect screenshots or logs where there's a connection to a device and a service discovery. See my second screenshot for an example.

I'm not sure why you don't see the duplicate RPAs but there is one crucial difference which is that your Pixel 2 is running 29 (Android 10) whereas mine is Android 11, so perhaps the ability to have multiple concurrent addresses is an Android 11 feature, and now happens with multiple gatt server instances.

Wild guess, but I mention this Android 10 vs 11 point specifically because I vaguely remember Apple/Google ENF calls out running the ENF advertising on a dedicated address in their design -- perhaps they added support for that in Android 11.

I would advise you to try a raw Bluetooth scanner at this point. I use dedicated devices and software rather than my linux box here just to be safe on my test results.

I think the key thing here is that a scanner wouldn't show the duplicate service registration. You need a full BLE stack.

As described above, I'm using a minimal-but-extremely-well-qualified embedded stack (BlueKitchen), there's very little magic between my test code and the radio.

At this point it appears the problem is with your test methodology and not the Herald code.

Please take a look at the code that I have highlighted twice above -- when the adaptor is powered off, it nulls the gatt server, never getting a chance to clear services or stop the server.

I've even just redone the tests against the latest code today

Just out of curiosity -- all the the manufacturer data fields in the logs above (i.e. PseudoDeviceAddress) still seem to end in two zeros? This should be fixed in the latest code?

adamfowleruk commented 3 years ago

@jimmo I don't see any new data in your reply about the issue raised. As asked twice already please provide the exact details of your test devices. Also provide your exact methodology. Without this data I will not be able to replicate your issue so please be specific.

As it stands you have not provided enough information for me to reproduce. Again please provide this information below:-

Phone make/model showing the issue (the same 'model' may have multiple model codes, that vary across the world. They often have different BLe chips and even in some cases different manufacturers)
Exact OS version in use (not just major version number)
Exact version of Herald in the test

You are incorrect about the scanner not seeing duplicate device service advertisements. If they are not observed over the network via a scanner, how can an observer see multiple services? My scanner monitors the three advertising channels simulataneously unless it sees a connection, and then it flips one of the channels to follow that data packet. Herald rarely uses connections to transfer this data. I'm pretty sure I saw one of these exchanges whilst performing the captures so the data is in there. Filter for non ATT and SCAN_REQ/RES packets.

If I receive no confirmation of the above information, and no new data, by morning I shall close this issue.

alwentiu commented 3 years ago

@adamfowleruk and @jimmo Here's a partial reproduction of the issue. But first let me clarify that my comment that @jimmo quoted above (https://github.com/AU-COVIDSafe/mobile-android/issues/33#issuecomment-745038577) relates to the tests I did for the COVIDSafe herald branch, not for the Herald demo app. For the COVIDSafe integration, I can confirm the issues as Jim stated. DTA has been provided with details to reproduce the issues.

I have just had a chance to test the Herald demo application yesterday, using the latest 'develop' branch (as of yesterday). Here's what I found:

On two phones: Nexus 5X, running Android 8, and Pixel 2XL, running Android 11, I could not reproduce the issues (turning bluetooth on/off did not trigger duplicate services or multi-advertisement on the same UUID).
On Samsung Galaxy S7 Edge (Android 8), Samsung Galaxy S20 (Android 10) and Samsung Galaxy Tab A 2017 (Android 7), duplicate services are registered when bluetooth is turned off and on again, as Jim observed. However, multi-advertisements were not observed. Some more details of my tests on Samsung Galaxy S7 Edge are provided below. On S7 Edge and S20, after maxing out on the number services, startAdvert failed, and this persisted after 14 hours or so (untill now), so other phones could not register their presence.
I have also let all 4 phones running over night. Nexus 5X crashed for unknown reason. I'm not sure what was the cause. I pulled a bug report and did a quick scan on 'startAdvert failed' but could not find an entry -- likely the crash was caused by a different issue.

I suspected that COVIDSafe integration with herald might have introduced a new bug that caused multi-advertisement with the same UUID. But there seems also a separate issue of GATT services not being properly cleared. From the preliminary tests above, it seems this may be specific to Samsung phones, but @jimmo confirmed that the issue manifested in Pixel 2, so I may be wrong here. It would be worthwhile checking whether bluetoothgatt.clearServices() really cleared the services properly. If not, this could point to a firmware bug (but this is purely speculation at this stage).

Details of Samsung Galaxy S7 Edge tests

Model number: SM-G935F
Baseband: G935FXXU8ETI2
Build number:R16NW.935FXXU8ETI2
Android version: 8.0.0
Samsung Experience version:9.0
Android security patch level 1 September 2020.

The phone was factory rest, no other bluetooth functionalities enabled (so no Samsung Share, or Google Nearby Share) to ensure no interference from other GATT servers.

For the scanner, I use a standard Ubuntu 20.04 LTS, running on Lenovo Thinkpad T460s, with a bluetooth 4.2 adapter. For the scanning software, I use the bluetoothctl tool that's part of the BlueZ suite (provided in the default Ubuntu installation). I use adb logcat to capture the logcat output.

For the test, I run Herald-for-android app ('develop' branch). I first scanned for the service UUID 428132af-4746-42d3-801e-4572d65bfd9b, and connected to the phone. The services listed for the phone are as expected (no duplicates). Then I turned bluetooth off on the phone and turned it back on. This causes the bluetooth address of the phone to rotate. I then reconnected and displayed the services. As you can see in the captured output of bluetoothctl below, duplicates are detected. I repeated this a number of times (~20 times or so), and startAdvert started to fail (see the excerpt of the logcat output below).

bluetoothctl session

scan.txt

logcat excerpt (startAdvert events)

startAdvert.txt

jimmo commented 3 years ago

You are incorrect about the scanner not seeing duplicate device service advertisements. If they are not observed over the network via a scanner, how can an observer see multiple services?

Once again, I am not talking about service advertisements. I am talking about service discovery, where the client requests the list of services from the server and it replies with the 16-bit handle ranges.

This has nothing to do with advertising. In fact, a peripheral may also act as a client and request the supported services from the central. This is how (for example) a peripheral device discovers the device name of the central.

Herald rarely uses connections to transfer this data.

This happens behind the scenes on Android/iOS when you request the services. The OS libraries abstract away the handles that are returned.

A similar process happens for characteristic discovery within a service.

I'm pretty sure I saw one of these exchanges whilst performing the captures so the data is in there. Filter for non ATT and SCAN_REQ/RES packets.

I did.

If I receive no confirmation of the above information, and no new data, by morning I shall close this issue.

Thanks for the ultimatum. Feel free to close any time.

adamfowleruk commented 3 years ago

@jimmo "Thanks for the ultimatum. Feel free to close any time." - What do you hope to achieve by trying to provoke an emotional response? Or by your last post in general? If you truly care about engaging in an open-source community to help improve software then this kind of baiting language is extremely unhelpful. If you wish to be taken seriously by your peers in our industry then please communicate in a professional and mature manner. Also, please respond to specific questions with specific answers rather than by re-quoting other parts of our responses.

Forgive me if this observation sounds harsh or is incorrect, but your last reply reads more as a rant rather than a real attempt at engaging with our community. This is certainly how many of your replies read. If this was not your intention then please reconsider how you engage in this forum in future.

The Herald Project members are providing their spare time to help create a fully open source set of software around reliable Bluetooth communication, that can be fully inspected for security and efficacy, and is developed fully in the open. Of course there are going to be issues - the ability to inspect and find them is the beauty of working in an open-source community rather than a closed source one! These issues should be expected, however irksome. When they are found, and we can reproduce them, we will fix them. Of course we will. I think our record on this has been pretty good on the Herald project. We do, however, need the data we are asking for though as without being able to reproduce an issue we cannot ensure through testing that it is fixed. Saying we think something is fixed only to find later it is not would be even more frustrating for all concerned, I'm sure you will agree. Us asking for specific information is not an unreasonable request, and is in fact standard in any bug remediation process. You should not take these requests as us not believing you - we merely need the data to help us identify the issues at hand.

We go to great lengths to publish as much information as possible on this project. We may occasionally clumsily word things - we're only human after all. If clarification is sought we shall address questions. We are trying to do the right thing, do what works, and to always act with kindness. Being on the receiving end of comments like yours harms our community's volunteers emotional well-being and welfare, and takes their time away from improving software. It is not right, and it is not warranted.

Further, we have shown a great deal of patience in asking for clarification, with a small number of specific questions, which you still have not provided an adequate response to. If your aim is to help improve open-source software then please start replying to specific questions with specific answers and ensure that your words are professional, that you respond with the relevant data when asked, and that most importantly you treat people with kindness and respect.

I understand that Digital Contact Tracing apps and approaches, and their supporting technology, elicit an emotional response from people. Passion is good in the software industry. Passion drives us forward to achieve new things. Passion though doesn't give people the right to target their displeasure at individuals and groups rather than engage in scientific debate as it should be - open, data-driven, facts based, and above all respectful.

I again for a second time direct you to our community code of conduct. This makes it clear that unprofessional behaviour shall not be tolerated. Please ensure that your future correspondence either via email or GitHub issues meets these standards. Up to this point it, at times, has not. Otherwise the project team shall have to, regretfully, consider what enforcement action we take on your interactions with our community. This may include removing entirely, or redacting partially, your comments where they contain unprofessional content. We prefer a completely open community, but you should be aware that as per the code of conduct this is one avenue of enforcement that is open to us.

Our sincere hope is that you engage more professionally in future. Some of your reports have been useful. I'm sure there is value in continuing to collaborate with you - but not at any welfare cost to our team.

If you wish to discuss these issues privately, or need support in some other way, then please do email me directly. You have my email address and I shall always be available to you for support.

Yours Sincerely,

Adam.

adamfowleruk commented 3 years ago

@alwentiu Thank you very much for this very useful diagnostic information. Sharing which phone models and OS' were used, and which showed this issue and which did not is precisely the information I have been looking for, so thank you for sharing.

Could I please also ask that you provide the Herald debug txt files from the S7 and S20 too? Thanks.

Separately a UK based collaborator found an issue yesterday with a Samsung S20. This is particularly relevant as they only came on the market in March 2020. In this instance the phone incorrectly reported its advertising capabilities to Herald. Some manufacturers re-implement some parts of the Android APIs. This is why we're somewhat paranoid about not relying on API documentation in Android alone, and instead prefer to test everything and to collect real data.

These are the kind of advertising issues on both older and modern phones which we have encountered many times over recent months. We're testing a workaround to the issues we found yesterday over the next couple of days, but your additional data will help confirm whether this is, as we currently suspect, a separate issue or instead a related one.

It's worth noting that the S20 in particular has two architecture variants. The one most prevalent seems to be SnapDragon based (The one a colleague here has), whereas the one likely to be present in Australia is Exynos based. It may be that these different stacks are causing inconsistency even though they are both "Samsung S20" phones. Our debug logs log the exact model code and so are very useful in debugging.

Thank you also for your detailed explanation of your test procedure, and for providing your logs. They have been very useful. I hope to hear back from you soon.

yaakov-h commented 3 years ago

Hey @adamfowleruk,

Since you're from the UK, and most of us here are from Australia, I think there's definitely a significant cultural difference in terms of what constitutes professional behaviour. I don't think anything from @jimmo has been untoward, except perhaps the last line about the ultimatum (though I do think that your required turnaround time is unreasonable - analyzing this is our side project as much as building Herald is yours, and we can't necessarily meet demands at the drop of the hat).

However, you do read as somewhat defensive and panicky, and in your last reply to @alwentiu, also somewhat passive-aggressive towards @jimmo, which I would definitely consider to be unprofessional behaviour.

The people in this issue and related ones are some of the most skilled and esteemed people in the Australian technology community, and I would hazard a guess that you didn't expect this kind of scrutiny to descend so rapidly on your little two-man band (plus various government teams and contractors in a Slack group that is closed off from the rest of the world - it doesn't feel very open-source or "developed in the open" or "open community" to me).

However, this was pitched somehow to the Digital Transformation Agency of the Australian Government, and they have publicly requested for input, though we probably would have torn it apart regardless. We have spent the months since late April analyzing COVIDSafe, uncovering flaws from significant UX issues through to significant functionality bugs and - somewhat impressively IMO - a path to RCE on many devices. Getting these fixed has been like talking to a wall, and I believe the Select Senate Committee has not yet gotten to reading the tech community's submissions.

Ultimately the approach taken by Herald is not new, and follows in the footsteps of the failures of OpenTrace, NHS v1, COVIDSafe, ABTraceTogether, and TousAntiCovid et al. It is fundamentally flawed and the only way that Herald makes it remotely work is a series of tricks and hacks that are not palatable to the general populace (i.e. the end users).

But given the strange constraints internal to the DTA and Department of Health (mostly a fundamental misunderstanding of how ENF works), this is all we have, and we do not want to see our hard work and input get thrown away by regressions in a hasty migration to another library. We want it to be as functional, private, and secure as possible, and we want an accurate depiction of its capabilities presented rather than a magical "yeah it works in 100% of all scenarios" chart like the DTA has presented.

Regardless of the warranty and liability disclaimer buried in the Apache licence, it needs to be fit for purpose - even if we disagree with everyone from BCG to the Minister about what exactly that purpose is. People's personal data, movements, social graphs, and lives are at stake with an app like this, particularly when millions of people have downloaded it and the reopening of the country [foolishly] relied upon it.

I may have misinterpreted parts of the thread as the very low level details of BTLE are a bit beyond my current understanding, but it seems to me that Jim raised an issue with X, you've responded that you can't see Y in your tests and need further info, Jim has stated that the issue is not with Y but with X, and you have finished up by stating that unless he answers your questions about Y you will close this issue. That's hardly conducive to "... gracefully accepting constructive feedback" as listed in the CoC.

Please re-evaluate your approach to community feedback. If you don't understand the details of something that has been raised or need further clarification, I'm sure Jim or others will be happy to step in and explain it, but you seem to be getting increasingly hostile and it's not going to help anyone. I can just imagine the headlines if our experts get banned from discussion for raising bugs with COVIDSafe's stack.

adamfowleruk commented 3 years ago

@yaakov-h I'm afraid I do not agree with your characterisation of my responses, or of your interpretation of this thread. You appear to have inferred both emotion and motivation in my responses where there are none.

As I have stated, where issues are raised and data provided we shall address them. If the community needs more time then they are free to ask for it. No 'extra time' was asked for in this thread. I shall not be leaving issues open for long periods of time by default, however, as this doesn't help anyone.

I have asked for clarification multiple times on the issues raised. Multiple times they have not been responded to with the requested information. @alwentiu did provide some useful data though.

I note much of your response is around the COVIDsafe app. Please direct that feedback to the Australian government.

adamfowleruk commented 3 years ago

@alwentiu Can you please retest your three devices using the latest Herald demo app code on develop? We may have identified two issues (#107 and #111) with how the Exynos chipset works differently from other Android devices.

We've tested here on a limited number of handsets (Exynos variants of devices are not that common here), but would be great to get your devices tested too please.

If they still fail then please include their Herald debug log files, as there's extra logging enabled in debug mode now too.

alwentiu commented 3 years ago

@adamfowleruk I've just tested the fix on Samsung Galaxy S7 Edge and Samsung Galaxy S20 (both are apparently running Exynos ...), and yes the issue seems to be fixed. I have actually tested a different fix, by clearing the GATT services and closing it when bluetooth is powered off, which seems to have achieved the same effect.

jimmo commented 3 years ago

The isMultipleAdvertisementSupported is more complicated than this. I don't have access to an Exynos based phone, but the behavior on my Pixel 2 (Android 11, 5th October Security Update) is that isMultipleAdvertisementSupported returns True when the adaptor is powered on, but False when it isn't. This means that ConcreteBLETransmitter::isSupported() is confusing the bleTimer() method.

12-18 16:40:14.229 19923 19923 D Sensor::BLE.ConcreteBLETransmitter: didUpdateState (state=poweredOff)
12-18 16:40:14.230 19923 19923 D Sensor::BLE.ConcreteBLETransmitter: stop
12-18 16:40:14.345 19923 19963 D Sensor::BLE.ConcreteBLETransmitter: bluetoothLeAdvertiser, LE advertiser present (multiSupported=false)
12-18 16:40:14.349 19923 19963 D Sensor::BLE.ConcreteBLETransmitter: advertLoopTask, state change (from=started,to=stopped,elapsed=193550ms)
12-18 16:40:14.350 19923 19963 D Sensor::BLE.ConcreteBLETransmitter: advertLoopTask, stop advert (advert=0ms)

12-18 16:42:32.470 19923 19923 D Sensor::BLE.ConcreteBLETransmitter: didUpdateState (state=poweredOn)
12-18 16:42:32.477 19923 19923 D Sensor::BLE.ConcreteBLETransmitter: bluetoothLeAdvertiser, LE advertiser present (multiSupported=true)

However, with the new commits added to develop, I no longer see duplicate service registrations (i.e. service discovery), but I do see duplicate RPAs still (i.e. concurrent advertisers).

So...

The two commits added to develop do two things:

Make isSupported() always return true, and bluetoothLeAdvertiser() now always returns an advertiser. On my Pixel 2, this previously only returned true when the adaptor was powered on.
When advertising is (re)started it clears services on a now-shared bluetoothGattServer instance.

It's very hard to follow this -- bluetoothGattServer is both a member of the AdvertLoopTask class and a local to startAdvert (now a member of the ConcreteBLETransmitter class). @alwentiu 's fix in https://github.com/AU-COVIDSafe/mobile-android/pull/36 works because in bleTimer() it clearServices() and stops() the bluetoothGattServer before null'ing it. I haven't tested this to see if it also solves the RPA issue, but I suspect that it might also need to stopAdvertising.

However, I would suggest a simpler fix, now that isSupported() is doing the right thing.

in bleTimer(), remove the power off check at the start -- just if (!supported) { return; }. Then in the "started" state, as well as checking if the timer has expired, also check whether the adaptor is powered off. That way the complete stopAdvert will be called, which will both clear service registrations and stop advertising.

I have confirmed that this solves both issues (duplicate RPAs and duplicate service registration) on my Pixel 2.

adamfowleruk commented 3 years ago

@alwentiu Awesome, thanks for checking.

@jimmo I'll review the bleTimer method. The best approach here may be an inactivity flag set such that the timer does not operate during a deliberate stop/start advertising loop.