nccgroup / Sniffle

A sniffer for Bluetooth 5 and 4.x LE
https://www.nccgroup.trust/us/our-research/sniffle-a-sniffer-for-bluetooth-5/?research=Public+tools
GNU General Public License v3.0
869 stars 129 forks source link

Is connection following mutually exclusive with -m MAC filtering? #65

Closed jsmif closed 7 months ago

jsmif commented 7 months ago

On the front page it says:
"Support for capturing advertisements from a target MAC on all three primary advertising channels using a single sniffer. This makes connection detection nearly 3x more reliable than most other sniffers that only sniff one advertising channel." (I added the emphasis)

I'm noticing two things right now:

1) Sniffle seems to be missing the connection packet between two test devices with very high probability right now. I thought it was completely broken until I added the -c option and sniffed on only one channel and then disconnected and reconnected a bunch of times until Sniffle eventually caught and followed the connection. So I must have just been getting unlucky before.

2) In the above statement, it says capturing advertisements on all 3 primary advertisement channels can be achieved somehow in parallel with this hardware. Is that limited to advertisements only, and excluding CONNECT_IND? Because when I use the -m option, I do certainly see advertisements from the target peripheral, but then when I connect to it, there is no connection following going on (also confirmed by examining the pcap.)

So I'm wondering, to reliably catch and follow connections, do I still actually need 3 pieces of Sniffle-compatible hardware, each of which is set to one of the advertising channels?

sultanqasim commented 7 months ago

What it does with the MAC filtering is that it listens for advertisements on channel 37 from the target MAC, and if no CONNECT_IND is observed on channel 37 right after an advertisement from the target MAC, it hops to channel 38 and waits a moment, and then hops to channel 39. In quiet RF environments, it can detect connection establishment fairly reliably. In environments with lots of BLE scanners, the hop timing for advertisers changes and it can miss connection establishment, so this is probably what was happening for you. At some point I want to look into ways to improve this to better handle the presence of many scanners, though it's tricky because of the very tight timing and latencies I have to work around.

sultanqasim commented 7 months ago

Let me know if the latest firmware (with 15ba641184c59abc4f32a941cdc2096b77ebba6e) improves connection detection reliability at all for you (when in the MAC filter mode with hopping between the three primary advertising channels). At least for me, the latest firmware is around 80-90% reliable at connection detection, where connection establishment can happen on any of the three primary advertising channels.

sultanqasim commented 7 months ago

I made some further improvements to connection detection reliability in advertiser hop mode, particularly on channel 38. In my limited testing with a few devices, I got greater than 90% connection detection reliability. Let me know how the latest firmware does for you.

jsmif commented 7 months ago

Thanks. Can you post a release so I don't need to compile the firmware?

sultanqasim commented 7 months ago

Here you go: https://github.com/nccgroup/Sniffle/releases/tag/dev-2024-04-07

EDIT: actually that build is broken, the messenger timeout changes I made last night/this morning broke the MAC filter, let me investigate and fix it, then I'll post another build

EDIT 2: I fixed it, now that same development tag linked above should work.

sultanqasim commented 7 months ago

You can try now, I fixed the timeout related bug (saying it again so you get notified)

sultanqasim commented 7 months ago

Fresh build of the latest test firmware: https://github.com/nccgroup/Sniffle/releases/tag/dev-2024-04-09

jsmif commented 7 months ago

Hello, sorry for the delay, I was traveling and didn't have access to my hardware. When I click on either of those tag links they both give an error 404.

jsmif commented 7 months ago

If the changes were incorporated into the recent v1.8 release though, here's what I can confirm:

1) It doesn't seem to be any more reliable at capturing the connection (I still need to explicitly tell it a specific channel to listen on, and only then does it capture it with high probability) 2) There's possibly a regression in the latest version in that I only ever see CLI output indicating it's sniffing on channel 37 by default? I am running sudo -E python3 sniff_receiver.py -o sniffle_test.pcap -s /dev/ttyACM1 -m 11:22:33:44:55:66, and I know that my peripheral is advertising on all channels 37, 38, 39 (which can be confirmed by running instances with -c 38, and -c 39). But all the output only ever says it's seeing packets on channel 37 like the below, unless I force it over to other channels with -c

Timestamp: 2.185487 Length: 21  RSSI: -33   Channel: 37 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 19
AdvA: 11:22:33:44:55:66 (Public)
0x0000:  20 13 66 55 44 33 22 11  02 01 06 09 09 4f 54 53   .fUD3"......OTS
0x0010:  65 72 76 65 72                                    erver

This happens whether I include the -m filter or not. Thus, because I know my Central only connects to the peripheral on channel 39 now (I hardcoded it), I only ever capture the connection if I have a -c 39 option.

Also if I look at the pcap in wireshark and add a !(btle_rf.channel == 0) filter, it shows no packets for the pcap with no -c option specified (but it does show packets if I look at a pcap with -c 38 specified)

sultanqasim commented 7 months ago

Thanks for testing. I deleted that development tag as the v1.8 release is newer and better.

There's no such regression regarding receiving only on 37, but what is happening is that your peripheral is behaving a bit differently than other peripherals I've used Sniffle with. Almost all peripherals I've encountered transmit an advertisement on channel 37 (and then listen for a CONNECT_IND or SCAN_REQ), then on 38 after a 300-500 us gap, then on 39 after a similar gap. Sniffle exploits this behaviour to capture connection requests on any of the three advertising channels by waiting for an advertisement on channel 37 targeting the right MAC, waiting a little more for CONNECT_IND on 37 if it comes, then hopping to 38, waiting a bit for CONNECT_IND, and then hopping to 39 for a few more milliseconds.

According to the Bluetooth Core Specification, the time between advertising on one primary advertising channel (like 37) and the next (like 38) within an advertising event must be less than 10 ms, but it is no more prescriptive than that, so in theory there can be much longer waits between channels. However, I've never seen such devices; all devices I've examined switch to the next advertising channel within a few hundred microseconds. Thus, Sniffle also assumes that the time between advertising on 37, 38, and 39 is short (a few hundred microseconds).

Bluetooth Core Specification 5.1 made a somewhat frustrating change (for sniffers) where the advertising hop sequence does not need to be strictly 37, 38, 39, and can instead be any random sequence that can also change every connection event. See page 9 of https://davidhoglund.typepad.com/files/1901_feature_overview_brief_final.pdf. This completely breaks Sniffle's expectations regarding advertiser channel hop sequence, and thus breaks receiving connection requests on channels other than 37 in hopping mode. Given the limitations of the CC26x2 hardware (and similar low cost devices) that can only sniff one channel at a time, if the advertising hop sequence is truly random, one can never reliably capture connection requests on all channels tuning into just one at a time. However, if it were truly random, you would at least see scan requests on other channels sometimes. Fortunately, in practice, most Bluetooth 5.1/5.2/5.3 devices I've encountered still keep on following the old 37->38->39 sequence, and also for extended advertising this doesn't matter since the connection requests happens on an auxiliary (secondary advertising) channel.

I can imagine two possibilities for why the advertising hop mode is not working with your peripheral:

  1. Your peripheral is waiting a long time (several milliseconds) between advertising on successive channels (like 37->38->39).
  2. Your peripheral is taking the Bluetooth 5.1+ liberty of not following a 37->38->39 advertising hop sequence, and instead using some other (possibly fixed) hop sequence, such as 38->37->39 or 37->39->38.

Either of these situations could be causing the behaviour you described, though I'm not sure which of the two it is. When I have time, I could make you a firmware build that could help figure out which of the two (or both) situations is happening.

jsmif commented 7 months ago

I thought I had already mentioned this but apparently not. The test central & peripheral are just the following Zephyr code running on nRF52840 dongles with only the advertised name changed: https://docs.zephyrproject.org/latest/samples/bluetooth/central_otc/README.html https://docs.zephyrproject.org/latest/samples/bluetooth/peripheral_ots/README.html

One of the CLI arguments for building & bundling is --hw-version 52 so I wouldn't be surprised if they're behaving in a BT 5.2 fashion, since Zephyr's reasonably up to date with the specs usually.

I had already expected that in practice I'd need to use 3 sniffers on all 3 channels, so I may be able to figure out the advertising sequence by just capturing 3 pcaps, if Wireshark properly shows the timestamps. But I'll have to wait until I get the other dev boards back from my colleague. Mostly though I just filed this ticket because I wanted to understand the observed vs. expected behavior, and you've already clarified that sufficiently that you could close this if you want.

sultanqasim commented 7 months ago

Thanks for the additional information. Looking at the code for that Zephyr example and what it calls into, it just asks the HCI to do legacy advertising. The advertising channel sequence and timing of transmission between the three advertising channels is left up to the Nordic controller. I'm curious what the Nordic controller is doing, maybe I'll try to test that myself some day when I have time.

My guess is that it's hopping consistently in an unchanging sequence that's not 37->38->39. This guess is based on you never seeing any packets on channels 38 or 39 in advertising hop mode, since if it were truly random then 37->38->39 would at least be correct sometimes. If that's indeed the case, then I could just add an option to Sniffle to change the channel hop sequence in which is sniffs for advertisements.

However, if it's randomized for every connection event (technically permissible with Bluetooth 5.1 and up), then unfortunately there's nothing I can do.

I'll leave this issue open so that we can follow up in case either of us has a chance to figure out what the Nordic controller is doing.

sultanqasim commented 7 months ago

One other thing I'll add: If it turns out that Nordic's controller is choosing a particular randomized advertising hop sequence once (at boot? or based on some static identifier?), and then sticking with the same sequence for every advertising event, then I could implement a feature to figure out this hop sequence and then automatically use it.

jsmif commented 7 months ago

To be clear, I don't think it's Nordic's controller that's setting the advertisement sequence in this case. Zephyr is providing a software-defined controller implementation and the Nordic chip is just using that. (https://developer.nordicsemi.com/nRF_Connect_SDK/doc/2.2.0/zephyr/connectivity/bluetooth/bluetooth-arch.html figure 27. I confirmed I added the "CONFIG_BT_LL_SW_SPLIT=y" option which isn't default, to make it use the (customizable) open source implementation of the link layer.)

So after setting up 3 sniffers, I can confirm it seems to be randomizing the advertisement order on the peripheral between runs, but not intra-run AFAICT.

E.g. in the first run it had an order of 38, 37, 39 image Though here there seems to be a major time skew occurring at packet 10 for channels 38 and 39 compared to 37. I'm not sure if that's indicative of packet loss or it switching things up (since I didn't save the captures.)

In the second run it was 39, 37, 38 image And again, after the 10th packet sequence, if we look at packets on line 11 it seems to change up from being 39, 37, 38 to being 39, 38, 37, with a much longer time difference switching from 38 to 37 than from 39 to 38. But then I realized that if we look at channel 39 packet 12, it's actually sent before channel 37 packet 11. So it's not actually changing the sequence, but just lost some packets or something.

After doing some more tests I came to the conclusion that it's basically randomized once at launch, and doesn't seem to be re-randomized again. (But still, it's randomized in the sense that I see plenty of non-37-38-39 sequences after restarting it multiple times.)

sultanqasim commented 7 months ago

Thanks for testing that. This matches the behaviour you were reporting, and it's good they're not changing the order between advertising events. In that case, I will implement a feature to choose the advertising hop sequence, and an auto-detect feature to figure out the hop sequence.

sultanqasim commented 7 months ago

One thing to be aware about though is that time synchronization between multiple Sniffle sniffers is not microsecond level precise. They could be millisecond off, so comparing microsecond level timestamps like that isn't reliable. I was looking at the Zephr LE controller implementation, and it looked like it's just hopping in sequence 37->38->39 (just like every other controller I've seen in the past). See https://github.com/zephyrproject-rtos/zephyr/blob/5841c11cae6a569678a9533e2b96b51ec140008a/subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv.c#L1460

If that's indeed the case, then the other possibility is hop timing being different from what I expect, for example maybe it hops from 37 to 38 quicker than I'm expecting.

I've ordered a nRF52840 USB dongle, I'll do some experimentation to figure out what's going on, and devise a plan accordingly.

I know that Broadcom and TI both decided not to implement Randomized Advertising Channel Indexing, and at least last time I checked, Nordic also wasn't implementing it in their controller (see https://devzone.nordicsemi.com/f/nordic-q-a/92540/randomized-advertising-channel-indexing and https://devzone.nordicsemi.com/f/nordic-q-a/66621/advertising-channel-sequence).

Maybe Zephyr is doing something different from what I'm seeing right now in the code, or maybe it's just a matter of hop timing. I'll see when the Nordic dongles get delivered and I have some time to experiment.

Thanks again @jsmif for your time investigating this.

sultanqasim commented 7 months ago

Let me know if v1.9 can detect your Zephyr device connection establishments on all three channels with a single sniffer. Try sniff_receiver.py and let me know what it reports for advertising hop interval. In case it hops 37->38->39 as the code suggests but just hops between channels quicker or slower than I was previously expecting, v1.9 should handle that.

jsmif commented 7 months ago

v1.9 seems much more reliable! This is a great improvement! Though still not quite 100%, as in a quick test it missed 2 out of 10 tries.

It says: Measured Advertising Hop: 502 us. But I notice in the sequence that precedes that, it never seems to see any channel 38 traffic, despite that I can see advertising is happening on channel 38 if I do a -c 38:

Timestamp: 0.097330 Length: 19  RSSI: -58   Channel: 37 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.098335 Length: 19  RSSI: -60   Channel: 39 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.206183 Length: 19  RSSI: -57   Channel: 37 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.207189 Length: 19  RSSI: -60   Channel: 39 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.312474 Length: 19  RSSI: -57   Channel: 37 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.313479 Length: 19  RSSI: -60   Channel: 39 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.414125 Length: 19  RSSI: -57   Channel: 37 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.415130 Length: 19  RSSI: -60   Channel: 39 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.522032 Length: 19  RSSI: -57   Channel: 37 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Timestamp: 0.523037 Length: 19  RSSI: -60   Channel: 39 PHY: 1M
Ad Type: ADV_IND
ChSel: 1 TxAdd: 0 RxAdd: 0 Ad Length: 17
AdvA: BB:BB:BB:BB:BB:BB (Public)
0x0000:  20 11 bb bb bb bb bb bb  02 01 06 07 03 0d 18 0f   ...............
0x0010:  18 0a 18                                          ...

Measured Advertising Hop: 502 us

TRANSITION: ADVERT_HOP from STATIC
sultanqasim commented 7 months ago

Great, so indeed your device has hopping between advertising channels faster than any of the devices I had previously tested with. If you want to see the actual ads on 38 and 39, using the -a (advertising only) option might work, though then it would miss connection attempts. This limitation is due to some inherent hardware latencies and retuning time.

EDIT: it might fail with the -a option due to the very fast hopping due to inherent latencies. Specifically, the Sniffle logic waits for an advertisement on 37, then hops to 38, but the latency in the advertisement getting to the firmware is around 130-150 us, and the latency in retuning to the next channel and being ready to listen is around 270-300 us. Given that your device only spends ~292 us between the end of the ad on 37 and the start of the ad on 38, it might to too late to catch the ad on 38 even with the -a option. Anyway, at least the connection detection is working on all three channels. With regards to the 20% miss rate, that's probably because of scan requests altering hop timing. Unfortunately, with such a fast advertising hop interval and inherent latencies in the CC26xx hardware, it's too fast to invoke Sniffle's hop postponement mechanism, so some misses don't surprise me.

jsmif commented 7 months ago

Currently with -m I don't see any difference with -a vs. without. Basically it sees advertisements on channels 37 and 39, and then after the "TRANSITION: ADVERT_HOP from STATIC" it only prints out advertisements on 37.

And actually once I removed the -m filter, I only see channel 37 packets printed out (with or without -a). Is that expected? I would expect without the -m that it'd just go back to the cycling through channels?

sultanqasim commented 7 months ago

Ah, yes that doesn't surprise me given your fast hop rate. With the -a option, it immediately triggers a hop to 38 after the firmware receives an ad on 37. However, the firmware gets it 130-150 us after the radio, and it takes another 200+ us to tune the radio to a new channel and start listening. Since your device advertises on 38 just 292 us after the end of the ad on 37 (faster than any other Bluetooth controller I've ever encountered), the Sniffle hardware is just not fast enough to catch the ad on 38. However, it will still catch CONNECT_IND PDUs on 38 and 39.

With regards to the -m option, the way it works is that without the option, it just sits on one channel. It only hops if a MAC filter is specified to say which target MAC to hop along with. With the -m option but no -a option, it will catch CONNECT_IND and SCAN_REQ/SCAN_RSP PDUs on 38 and 39. The -a option is intended to show the ads on 38 and 39, and works with every controller implementation used by devices I have (where there is a 370+ us gap between the end of the ad on 37 and the start of the ad on 38), though it won't work with your device because of the fast hopping.

sultanqasim commented 7 months ago

Closing this for now, as it’s behaving as expected. Getting 100% reliability with the very fast advertising hop speed of your device would be difficult because of scan responses altering timing, and the latencies of the CC26xx hardware preventing it from reacting quick enough. For more typical advertising hop speeds I’ve seen with most other controller implementations, it should be very reliable at connection detection.