hw-probe uses a hardcoded salt (!!!) and the user's MAC address to create a static identifier for each computer. They then truncate the hash before uploading it with the report.
This is scary, as the ONLY source of unknown information is the MAC address, which everyone and their mother knows on a network. It's very easy, then, to find out who owns which computer on a network. In addition, truncating the hash has no effect - hashes have a (relatively) even distribution, and the hashes don't say very much about the data they represent.
I have some ideas on how to do better. For one, the computer identifier is optional, so you can opt out of sending that info if you'd like. (If you do so, a completely random value is generated instead.)
As for those who provide this info, here are my ideas to preserve privacy and security:
Hash Data
We'll still be using the MAC address of the user's computer. Unsurprisingly, you can get this using the mac_address crate. However, I want to flip some of its bits and change a couple others to be static, mostly to lower the uniqueness of the values a bit.
Salt Data
To prevent people from identifying computers from their MAC address immediately, I'll add a few other variables to salt with. However, it's important to note that these MUST be static, as we need the final hashes to be unique. This is generally discouraged, but it does significantly increase the privacy for computers represented in the population of reports.
Currently, I want these to stem from the SMBIOS on desktop computers. Android devices may need other sources.
I may also extend these to include more specific identifiers when they're available, such as non-swappable component identifiers. However, it's important that these stick to large, normal ranges.
For potential data sources from SMBIOS, see page 29 of its specification. Windows also offers an easy API for grabbing these values. It may be worth creating a quick and dirty cross-platform library for this task.
Hashing
I'll use the argon2 crate (see its Wikipedia) to hash the user's MAC address. I'll need to think more about the strategy of choosing an NIC for its MAC address. Preferably, it'd be the motherboard's NIC.
Upload
For now, I plan to upload the full hash, since there doesn't appear to be a difference in doing so (truncation doesn't really affect security, but does increase the chance of collisions)
hw-probe
uses a hardcoded salt (!!!) and the user's MAC address to create a static identifier for each computer. They then truncate the hash before uploading it with the report.This is scary, as the ONLY source of unknown information is the MAC address, which everyone and their mother knows on a network. It's very easy, then, to find out who owns which computer on a network. In addition, truncating the hash has no effect - hashes have a (relatively) even distribution, and the hashes don't say very much about the data they represent.
I have some ideas on how to do better. For one, the computer identifier is optional, so you can opt out of sending that info if you'd like. (If you do so, a completely random value is generated instead.)
As for those who provide this info, here are my ideas to preserve privacy and security:
Hash Data
We'll still be using the MAC address of the user's computer. Unsurprisingly, you can get this using the
mac_address
crate. However, I want to flip some of its bits and change a couple others to be static, mostly to lower the uniqueness of the values a bit.Salt Data
To prevent people from identifying computers from their MAC address immediately, I'll add a few other variables to salt with. However, it's important to note that these MUST be static, as we need the final hashes to be unique. This is generally discouraged, but it does significantly increase the privacy for computers represented in the population of reports.
Currently, I want these to stem from the SMBIOS on desktop computers. Android devices may need other sources.
I may also extend these to include more specific identifiers when they're available, such as non-swappable component identifiers. However, it's important that these stick to large, normal ranges.
For potential data sources from SMBIOS, see page 29 of its specification. Windows also offers an easy API for grabbing these values. It may be worth creating a quick and dirty cross-platform library for this task.
Hashing
I'll use the
argon2
crate (see its Wikipedia) to hash the user's MAC address. I'll need to think more about the strategy of choosing an NIC for its MAC address. Preferably, it'd be the motherboard's NIC.Upload
For now, I plan to upload the full hash, since there doesn't appear to be a difference in doing so (truncation doesn't really affect security, but does increase the chance of collisions)