jhnphm / boar

Automatically exported from code.google.com/p/boar
Apache License 2.0
0 stars 0 forks source link

Boar should detect md5 hash collisions #16

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Boar uses the 128 bit md5 checksum algorithm. The odds against an accidental 
collision (two different files having the same checksum) are truly astronomical 
(if you have 10 000 000 000 files, the risk of at least one collision is about 
10^-19). However, md5 does have a weaknesses that makes it possible to 
construct collisions intentionally. This feature is therefore mostly a security 
issue, since accidental collisions are rare enough.

Collisions will cause problems, as boar currently assumes that files with the 
same md5 checksum are always identical. Most likely, one of the files causing 
the collision will be lost. 

Boar should prevent such problems by storing an alternative checksum (maybe 
some variant of SHA) for every stored file, and use this to make sure that 
files with the same md5 checksum are truly identical. There will be no attempt 
at making the boar repository store md5 collisions. If a collision is found 
during an import or checkin, boar will abort the operation and print an error 
message. 

Original issue reported on code.google.com by ekb...@gmail.com on 23 Mar 2011 at 8:34

GoogleCodeExporter commented 9 years ago
One possible security impact scenario is the following:

Alice and Bob shares a repository. Alice uploads evil.exe, a malicious file. 
Alice has used a vulnerability in the md5 checksum and designed the file so 
that its checksum is identical to the checksum of dostuff.exe, a well-known 
useful program.

Bob uploads his dostuff.exe. However, since it has the same checksum as the 
existing evil.exe, it is not actually uploaded. Boar notices (wrongly) that it 
already has this file in the repo and uses that copy instead. 

Some time later, Bob downloads his dostuff.exe file again. However, instead he 
receives the evil.exe that Alice uploaded earlier. When he executes the 
program, bad things will happen.

Original comment by ekb...@gmail.com on 2 Sep 2011 at 8:33

GoogleCodeExporter commented 9 years ago
Fixed in changeset 5518090482c9. All files now have a corresponding sha256 
checksum that ensures that no collisions can go undetected. 

Original comment by ekb...@gmail.com on 25 Sep 2011 at 9:17

GoogleCodeExporter commented 9 years ago
Reopening the issue. As it turns out, the implemented solution is too slow. A 
verification on a repository will take about twice as long with md5 collision 
detection enabled (due to the verification of the sha256 database). I had hoped 
to mitigate this slowdown by using python multiprocessing features, but while 
that works well on Linux, I have not succeeded in making it work on windows. 

Due to md5 collision detection being a somewhat niche feature, I'm going to 
disable that feature for the next release as to not make boar slower for the 
current boar user base. 

Original comment by ekb...@gmail.com on 8 Nov 2011 at 11:45

GoogleCodeExporter commented 9 years ago
Issue 80 has been merged into this issue.

Original comment by ekb...@gmail.com on 12 Aug 2012 at 7:29

GoogleCodeExporter commented 9 years ago
One way to handle this is to store the first 8 bytes of the file as well and 
check against that as well as the md5, this makes it nearly impossible to have 
a collision even on purpose.

Original comment by cyberempires@gmail.com on 21 Sep 2012 at 6:12

GoogleCodeExporter commented 9 years ago
In response to comment 5: Do you have a reference for your statement? I've 
always assumed that md5 simply is inherently unsafe. If that can be mitigated 
with a simple check of the first part of the contents, that would certainly 
make things easier. 

Original comment by ekb...@gmail.com on 22 Sep 2012 at 8:15

GoogleCodeExporter commented 9 years ago
Astronomical or not -- I'm slightly paranoid about it. Does it have to be 
SHA256 to detect md5 collisions or might something fast like the SpookyHash 
SnapRaid uses be an option as well?

Original comment by mlo...@web.de on 25 Dec 2013 at 1:16

GoogleCodeExporter commented 9 years ago
Or maybe this:
https://github.com/SaberParker/xxHash-Python
https://code.google.com/p/xxhash/

Original comment by mlo...@web.de on 25 Dec 2013 at 1:25