rpm-software-management / deltarpm

Other
21 stars 12 forks source link

Replace MD5 with xxhash #16

Open py0xc3 opened 2 years ago

py0xc3 commented 2 years ago

DNF still uses MD5 for error correction after downloading. I suggest to replace MD5 with xxhash or another comparable algorithm. This will improve the performance. Given that there is a OpenPGP-based cryptographic authentication/verification later anyway, it will be sufficient to stick with xxhash 64 for error correction. However, even with gpgcheck=0 I expect xxhash 64 sufficient for error correction.

On 64 bit architectures, xxhash 64 already increases the performance in a noteworthy manner in BTRFS's checksumming compared to CRC32C (which is itself generally faster than MD5), where xxhash 64 was standardized along with three other algorithms. xxhash is specifically designed for modern architectures, unlike CRC32 or MD5. As MD5 used to have a cryptographic purpose (which it no longer fulfills anyway), it is unlikely that it will have a performance advantage against xxhash on any architecture that is in use today.

If it is easier in development to stick with 128 bit length, there is also xxhash 128.

Further, the md5 use becomes obvious to users through the "md5 mismatch of result" when packages have a mismatch of result after downloading, which makes dnf to download the whole package instead of only the delta. However, users instinctively link md5 to "broken crypto" which decreases trust in dnf, even if md5 is used in a non-crypto function like here. It was opened a topic about that some days ago on discussion.fedoraproject.org (just one example) by a user who was also a bit misled by the "md5" in his dnf: md5 and its reputation can create confusion and misinterpretation when users see that it is in use.

I found it in this repo with git grep --text "md5 mismatch"

This issue is not critical, but might be considered in future developments.

crrodriguez commented 1 year ago

Unfortunately what is been checked is the md5 signature embeeded on the RPM files. changing the algorithm will break with existent packages.. the whole design of this thing is bonkers. details of the particular checksum algorithm used by the implementation should have never been transparent to the callers..But one needs to deal with it 8-( Today the algorithm selected is wrong, slow and broken.

mlschroe commented 1 year ago

It's not wrong at all. The md5 sums are used as better crc function, not in a cryptographic sensitive way.

mlschroe commented 1 year ago

(But yes, something like xxhash would have been a better choice. But it didn't exist at that time.)

mlschroe commented 1 year ago

(And please do not forget that deltarpm was written in 2005...)

py0xc3 commented 1 year ago

not in a cryptographic sensitive way.

Agreed. The problem is that users are not aware of that: many people link md5 intuitively with broken crypto, even if it is not used for crypto. This part is more about psychology than cryptography, and we had already some issues with users who questioned the security of dnf because of that. It creates a "bad taste" that can create uncertainty and at the worst can damage the reputation. I absolutely agree that md5 still fulfills the purpose it is deployed for in this use case from a purely technical point of view.

However, indeed, I have not considered the rising incompatibility with RPM files. That's indeed a show stopper.

Because the purely technical disadvantageous of md5 are not security-critical (finally that's just about performance and efficiency), it might be a compromise for now to change the message, remove "md5 mismatch" and add "hash mismatch" or something like that. Just to avoid the misunderstanding of "md5" = "broken for years" = "application is insecure".

Maybe xxhash can be added to the roadmap, and if at some point rpm gets a major update that breaks compatibility anyway, it can be combined with replacing md5.

On 2/8/23 12:56, Michael Schroeder wrote:

It's not wrong at all. The md5 sums are used as better crc function, not in a cryptographic sensitive way.

mlschroe commented 1 year ago

Yeah, changing the message so that it no longer mentions md5 is certainly a good idea.

Regarding rpm upstream: see https://github.com/rpm-software-management/rpm/issues/1292