wiglenet / wigle-wifi-wardriving

Nethugging client for Android, from wigle.net
https://wigle.net
BSD 3-Clause "New" or "Revised" License
649 stars 199 forks source link

Record Bluetooth Manufacturer-specific data #615

Closed XenoKovah closed 7 months ago

XenoKovah commented 10 months ago

This is a request to start collecting additional information from Bluetooth Low Energy Advertisements & Bluetooth Classic Extended Inquiry Responses, in order to better determine which company a device is associated with.

One of the types of information that can be advertised by a device is "Manufacturer-specific" data (type 0xff, https://developer.android.com/reference/android/bluetooth/le/ScanRecord#DATA_TYPE_MANUFACTURER_SPECIFIC_DATA) which is supposed to be a 16 bit company ID followed by arbitrary-length service-specific data.

The company IDs can be looked up here: https://bitbucket.org/bluetooth-SIG/public/src/main/assigned_numbers/company_identifiers/company_identifiers.yaml (Note: this is not the same as the UUID16 company IDs from ticket #614 )

Within my data, this data type is by far the most common type of advertised data. However, because the company IDs extractable from this field are less accurate than those from the other UUID16 entries (like type 0x2 and 0x3 mentioned in ticket #614 ), it should perhaps be done after data extraction for those types.

Example with Company ID:

Pasted Graphic 16

NOTE: This type of information can also appear in Bluetooth Classic Extended Inquiry Responses.

Example:

Pasted Graphic 15

Note that sometimes the vendor IDs may be endian-swapped. For instance in the below data, it says it's an iPhone but the company ID is 0x4C00. The correct Apple ID (from here) is 0x004C. (In reality I wasn't able to find a location in the spec that specifically said that the 16-bit company ID should be little-endian, so vendors may have changed their endianness over time.)

Pasted Graphic 13

And sometimes they just don't seem to correspond to anything, and my suspicion is that vendors are just skipping the company ID and putting arbitrary data into the advertised type.

Pasted Graphic 12

However, the request would be to still record and expose the 16-bit company ID, so that patterns can be searched for (e.g. if a vendor always uses the same, but unassigned, value across all their devices.)

XenoKovah commented 10 months ago

The format of the Manufacturer-specific data is mentioned in the Core Specification Supplement section 1.4 (because why would they just describe it in the core spec, right?) It just says "The first two data octets shall contain a company identifier from Assigned Numbers. The interpretation of any other octets within the data shall be defined by the manufacturer specified by the company identifier. So it doesn't actually lay out what endianness those first two octets are supposed to be, nor any sort of endianness bit, which is why I think some devs occasionally flip them around.

Flipping and checking if there's no match seems like a good heuristic. In practice I just print both interpretations for myself so I can just decide for myself which I think is more likely.

XenoKovah commented 10 months ago

Also just as an FYI, this data type is also where things like Apple's "iBeacon" data is found. (See attached. Proximity Beacon Specification R1.pdf)

In that doc, on page 6, you'll see byte 4 is 0xFF. That's this manufacturer-specific data advertisement type. Bytes 5 and 6 are then little-endian 0x4C, 0x00, which is Apple's ID. And then bytes 7 and 8 are little endian 0x02, 0x15. Those two bytes specifically are what differentiate an Apple iBeacon from other beacons (like Google's (discontinued?) Eddystone which I don't know anything about currently).

Which is why for instance on a Tesla and other devices you'll see it advertising manufacturer-specific data, but the ID will say Apple. I'm not sure where you're planning to show company ID, but for instance if you were going to show it in the Basic Search UI, then it might be worth just checking those two additional bytes and saying "iBeacon" rather than "Apple" in that case. But the important thing is just to allow the overall data to be queried so that if someone (like me!) wants to grab and parse and try and make sense of it, they can. E.g. I was just looking at some data from a Korean vacation yesterday, and found that some korean bank point of sales devices will reply with a name of "KFTC BANKPOS" if you get lucky and catch the scan response. But even if they don't, they all seem to always include in their advertisement the exact same manufacturer-specific data of "0215585cde931b0142cc9a1325009bedc65e00010002c5". So it's possible to tell that even no-named devices are actually "KFTC BANKPOS" devices based on this iBeacon which is sort of misbehaving (in normal usage they're supposed to change the embedded UUID128 and/or major/minor ID from device to device for normal usage.)

rksh commented 10 months ago

In testing, this is where the preponderance of mfgr data shows up, rather than #614

XenoKovah commented 10 months ago

Yeah, that's why I mentioned in the original ticket (new emphasis added) "Within my data, this data type is by far the most common type of advertised data. However, because the company IDs extractable from this field are less accurate than those from the other UUID16 entries (like type 0x2 and 0x3 mentioned in ticket https://github.com/wiglenet/wigle-wifi-wardriving/issues/614 ), it should perhaps be done after data extraction for those types."

The combination of vendors messing up their endianess, and the proliferation of beacons, which are indicating a company other than the device manufacturer, means that the data is valuable, just that it's less accurate per-datapoint than the #614 data, for the purposes of company identification. (But then this data has more possible purposes than only that.)

bobzilladev commented 10 months ago

This has been a high bang for the buck change, thank you for bringing this up! Local display of the manufacturer will be in version 2.80 of the app, and going forward the project would like to locally store and centrally aggregate the manufacturer id. Data payloads beyond the manufacturer id strays into a fingerprinting/privacy area which the project is not going to pursue centrally aggregating.

rksh commented 7 months ago

feature implemented in release 2.81