python / cpython

The Python programming language
https://www.python.org/
Other
61.35k stars 29.55k forks source link

uuid.uuid1() on certain Macs does not generate unique IDs #85724

Open e1717a5f-f0f9-4d66-935d-7a20dee24804 opened 3 years ago

e1717a5f-f0f9-4d66-935d-7a20dee24804 commented 3 years ago
BPO 41552
Nosy @ronaldoussoren, @ned-deily, @vedgar, @remilapeyre, @websurfer5
Files
  • dumpaddr.c
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['OS-mac', 'type-bug', '3.8', '3.9', '3.10'] title = 'uuid.uuid1() on certain Macs does not generate unique IDs' updated_at = user = 'https://bugs.python.org/terrygreeniaus' ``` bugs.python.org fields: ```python activity = actor = 'ronaldoussoren' assignee = 'none' closed = False closed_date = None closer = None components = ['macOS'] creation = creator = 'terrygreeniaus' dependencies = [] files = ['49794'] hgrepos = [] issue_num = 41552 keywords = [] message_count = 13.0 messages = ['375387', '375394', '375398', '375403', '375405', '375415', '375441', '375460', '380065', '380068', '380929', '386167', '386560'] nosy_count = 6.0 nosy_names = ['ronaldoussoren', 'ned.deily', 'veky', 'remi.lapeyre', 'Jeffrey.Kintscher', 'terrygreeniaus'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue41552' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ```

    Linked PRs

    e1717a5f-f0f9-4d66-935d-7a20dee24804 commented 3 years ago

    I'm using Python 3.8.5 on a 2016 MacBook Pro running macOS Catalina 10.15.3. This model has a touch bar and macOS communicates with the touch bar via a dedicated "iBridge" network interface. The iBridge network interface uses a fixed MAC address that is common across all MacBook Pro models (ac:de:48:00:11:22).

    Normally uuid.uuid1() picks up my WiFi MAC address (which is obviously unique), but this evening I noticed it was generating UUIDs based on the iBridge MAC address. Since the iBridge MAC is shared across all MacBook Pro laptops, there's no way to guarantee that the UUIDs are now universally unique. I'm not sure what triggered uuid.uuid1() to start using my iBridge interface although there was an Internet outage here at some point so maybe the network interfaces got reordered. The iBridge interface (en5) does appear before my WiFi interface (en0) in the output of ifconfig now.

    Here's a quick example of the problem:

    greent7@avocado:~$ python3
    Python 3.8.5 (default, Jul 21 2020, 10:48:26)
    [Clang 11.0.3 (clang-1103.0.32.62)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import uuid
    >>> uuid.uuid1()
    UUID('32bbad32-de12-11ea-a0ee-acde48001122')

    And here's the output from ifconfig:

    greent7@avocado:~$ ifconfig lo0: flags=8049\<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 options=1203\<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=201\<PERFORMNUD,DAD> gif0: flags=8010\<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0\<> mtu 1280 en5: flags=8863\<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether ac:de:48:00:11:22 inet6 fe80::aede:48ff:fe00:1122%en5 prefixlen 64 scopeid 0x4 nd6 options=201\<PERFORMNUD,DAD> media: autoselect status: active en0: flags=8863\<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=400\<CHANNEL_IO> ether 78:4f:43:5e:b9:86 inet6 fe80::1c4b:d303:b374:c2f3%en0 prefixlen 64 secured scopeid 0x5 inet6 fd00:1cab:c0ac:fc82:80e:f701:8302:6287 prefixlen 64 autoconf secured inet6 fd00:1cab:c0ac:fc82:1c38:9f17:2073:8eb prefixlen 64 autoconf temporary inet 192.168.0.11 netmask 0xffffff00 broadcast 192.168.0.255 nd6 options=201\<PERFORMNUD,DAD> media: autoselect status: active en3: flags=8963\<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=460\<TSO4,TSO6,CHANNEL_IO> ether 82:46:1a:46:5c:01 media: autoselect \<full-duplex> status: inactive en1: flags=8963\<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=460\<TSO4,TSO6,CHANNEL_IO> ether 82:46:1a:46:5c:00 media: autoselect \<full-duplex> status: inactive en4: flags=8963\<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=460\<TSO4,TSO6,CHANNEL_IO> ether 82:46:1a:46:5c:05 media: autoselect \<full-duplex> status: inactive en2: flags=8963\<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=460\<TSO4,TSO6,CHANNEL_IO> ether 82:46:1a:46:5c:04 media: autoselect \<full-duplex> status: inactive bridge0: flags=8822\<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500 options=63\<RXCSUM,TXCSUM,TSO4,TSO6> ether 82:46:1a:46:5c:00 Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x2 member: en1 flags=3\<LEARNING,DISCOVER> ifmaxaddr 0 port 7 priority 0 path cost 0 member: en2 flags=3\<LEARNING,DISCOVER> ifmaxaddr 0 port 9 priority 0 path cost 0 member: en3 flags=3\<LEARNING,DISCOVER> ifmaxaddr 0 port 6 priority 0 path cost 0 member: en4 flags=3\<LEARNING,DISCOVER> ifmaxaddr 0 port 8 priority 0 path cost 0 media: \<unknown type> status: inactive p2p0: flags=8843\<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304 options=400\<CHANNEL_IO> ether 0a:4f:43:5e:b9:86 media: autoselect status: inactive awdl0: flags=8943\<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484 options=400\<CHANNEL_IO> ether f6:38:1e:e0:6c:3f inet6 fe80::f438:1eff:fee0:6c3f%awdl0 prefixlen 64 scopeid 0xc nd6 options=201\<PERFORMNUD,DAD> media: autoselect status: active llw0: flags=8863\<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=400\<CHANNEL_IO> ether f6:38:1e:e0:6c:3f inet6 fe80::f438:1eff:fee0:6c3f%llw0 prefixlen 64 scopeid 0xd nd6 options=201\<PERFORMNUD,DAD> media: autoselect status: active utun0: flags=8051\<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380 inet6 fe80::afc9:f21a:4d82:2c8d%utun0 prefixlen 64 scopeid 0xe nd6 options=201\<PERFORMNUD,DAD> utun1: flags=8051\<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000 inet6 fe80::4b52:18b4:5f46:4edf%utun1 prefixlen 64 scopeid 0xf nd6 options=201\<PERFORMNUD,DAD>

    ned-deily commented 3 years ago

    FWIW, I see similar behavior on a 2017 MBP running with 10.15.6 or 11.0 (Big Sur) beta 4. That's ... odd that there is a non-unique MAC address. (Not surprisingly, there is no such problem on an iMac that doesn't have the touchbar subsystem.) That particular interface doesn't show up in the user-visible Network panel of System Preferences so it can't even be easily reordered there. I suppose there is a good reason why it appears at the top of the interface list but, yes, we don't want to be using a non-unique MAC address to generate UUIDs. The question then is: is there any way for the uuid module to recognize and ignore such interfaces other than by the hardcoded MAC address?

    fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 3 years ago

    I'd like to point out that _even_ if you do reuse MAC address:

    1. node IDs don't have to be derived from MAC addresses only (though in practice they usually are - I'm just saying the RFC gives you permission to include other information in it).

    2. The time resolution is 100ns. As long as your UUID generations are more than 0.2μs apart, you're safe from collisions.

    3. There is still a clock sequence, which for these purposes can be viewed as random. Even if you _do_ generate UUIDs on different machines with same MAC and naive nodeID-deriving algorithm, two or more of them within the same 100ns-interval, there is still only a probability of 1/16384 (62ppm) of collision.

    In short, it's probably not a problem, though if there is an easy fix, of course it should be applied. Currently, there are two ways to indicate "this is not a real unique MAC address" that UUID recognizes:

    # Virtual interfaces, such as those provided by # VPNs, do not have a colon-delimited MAC address # as expected, but a 16-byte HWAddr separated by # dashes. These should be ignored in favor of a # real MAC address

    and the 41st bit test /More details at bpo-32107/. Maybe there is a third way, but if the above address doesn't play by these rules, maybe hardcoding it isn't so bad an idea.

    23982c60-ed6c-47d1-96c2-69d417bd81b3 commented 3 years ago

    The question then is: is there any way for the uuid module to recognize and ignore such interfaces other than by the hardcoded MAC address?

    Could uuid1 xor all mac addresses on MacOS? The result would be deterministic and unique as long as there is at least one mac address that is unique.

    ronaldoussoren commented 3 years ago

    Note that recent commits to the trunk and 3.8 and 3.9 branches have added a _uuid module using libuuid (and the comparable Windows API). I'd expect that this extension will also be used on macOS.

    I'd advise to check if this issue is still present when using that extension before spending too much time on a fix.

    fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 3 years ago

    +1 on xoring all MAC addresses to get NodeID. Since it is only done once at import time, it shouldn't be too expensive (many other things including OS calls are done at initialization). But yes, if the problem goes away with new version of _uuid, then the fix isn't needed.

    e1717a5f-f0f9-4d66-935d-7a20dee24804 commented 3 years ago

    xoring does not guarantee uniqueness and has a good chance of discarding it, so it seems like a bad idea to me.

    Suppose I have exactly two adapters with MAC addresses 0 and 3. Suppose you have exactly two adapters with MAC addresses 1 and 2.

    We'll both xor all our addresses and both get 0 ^ 3 == 1 ^ 2. This trivially extends to 48 bits.

    Suppose I have exactly two adapters from the same manufacturer. The xor will throw away all of the "uniqueness" guaranteed by the manufacturer OUI and replace it with 0.

    Suppose you have exactly two adapters from a different manufacturer (and nothing else). The xor will throw away all of your "uniqueness" guaranteed by the manufacturer OUI and replace it with 0.

    Now the only uniqueness between your UUIDs and my UUIDs will be the timestamp and the low-order bits of the xor'd MAC, whereas without the xor your UUIDs and my UUIDs would have absolutely been guaranteed to be unique since they are from different manufacturers with different OUIs.

    I realize that the documentation for uuid1() states that it isn't guaranteed to give unique addresses if the time synchronization necessary isn't supported by the platform, so I suppose this could even be a documentation fix if no real solution can be found, but that would be really undesirable.

    fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 3 years ago

    Yes, you're right. Xoring can be replaced by any key-derivation function, though of course that's probably overkill.

    ronaldoussoren commented 3 years ago

    I've verified that python 3.8 and 3.9 use the system uuid functions (part of libsystem). This means this issue might not be fixable without dropping the use of the _uuid extension.

    @terrygreeniaus: Can you still reproduce this issue? If so, does "import _uuid" work on your system?

    I do have a system with an iBridge interface, but that interface is below the active network interface and is never used. That system is headless, I cannot easily try to reproduce the issue by messing with its network interfaces.

    I wonder how useful it is to try to fix this issue, I'd personally prefer to use uuid4() because that doesn't leak information about the host.

    ronaldoussoren commented 3 years ago

    An option is to use the host UUID instead of libuuid (as used by the _uuid extension). This has two problems though: (1) the RFC prescribes that the node id is a IEEE 802 MAC address, and (2) the host UUID is a full UUID and would have to be post processed. Because of this I don't think this is a usable alternative.

    The relevant API is gethostuuid().

    Related stackoverflow: https://stackoverflow.com/questions/933460/unique-hardware-id-in-mac-os-x

    Related elastic issue (where I found gethostuuid): https://github.com/elastic/beats/issues/14439

    ronaldoussoren commented 3 years ago

    the most recent UUID implementation on opensource.apple.com: https://opensource.apple.com/source/Libc/Libc-1353.100.2/uuid/uuidsrc/gen_uuid.c.auto.html

    The implementation of get_node_id() doesn't ignore the iBridge interface, which means uuid_generate_time(3) could run into this issue (and because of that, Python's uuid module)

    I've filed an issue with Apple about this: FB8895555

    Note that switching to libuuid from util-linux wouldn't help here, that also doesn't ignore the iBridge interface.

    I'm tempted to close this issue as "3th-party" because this is bug in the system implementation of uuid_generate_time.

    ronaldoussoren commented 3 years ago

    I got feedback on FB8895555: Apple says they have fixed the issue (they don't mention in what version, but I expect 11.2). I haven't checked this yet.

    ronaldoussoren commented 3 years ago

    @terrygreeniaus: Are you running macOS 11 on your MacBook Pro?

    If so, could you verify the hardware address of the iBridge interface?

    I've checked to libc sources on opensource.apple.com and those don't seem to contain code to treat the iBridge interface specially.

    I've also attached a small program that dumps the Mac addresses of interfaces using the same mechanism as used by the UUID code in libc. If the Mac address is unchanged they may have done something that affects that code.

    ronaldoussoren commented 1 year ago

    I now have access to an Intel MBP with Touch Bar, and this issue is still present in macOS 13 with Python 3.11 (Python.org installer). Said system also has the bridge interface before regular interface in the output of ifconfig (as in the original message of this issue)

    Both libuuid (as used by the _uuid extension) and uuid.getnode() use the bridge interface to calculate the node id of the system. A possible workaround is to change the uuid module to:

    1. Check if _uuid.generate_time_safe returns a value that indicates that the MAC address of the bridge interface is used, and if so set uuid._generate_time_safe to None
    2. Update uuid.getnode to ignore interfaces with the specific MAC of the bridge interface

    BTW. An M1 MBP with Touch Bar does not have a network interface with this particular MAC address, and doesn't seem to have a network interface related to the Touch Bar at all.

    @ned-deily, what's you opinion on the fix I mentioned above? If this looks fruitful I can create a PR for this with tests and can test the result on a system that's affected by this.