quicklisp / quicklisp-client

Quicklisp client.
http://www.quicklisp.org/
MIT License
298 stars 75 forks source link

Integrate a small (130 line) implementation of MD5 #199

Open jesseoff opened 4 years ago

jesseoff commented 4 years ago

Creates a ql-md5 package with exported function md5-file that returns a hex string that can be compared with string-equal to verify against. New condition corrupt-local-archive, potentially signaled from check-local-archive-file

quicklisp commented 4 years ago

This looks good, thank you!

Can you tell me some benchmarks on various implementations? Like, how many MB/sec relative to SBCL on ABCL, CLISP, and others?

jesseoff commented 4 years ago

I will benchmark further. MD5's use a lot of 32-bit ops which on non-64bit archs unfortunately won't be fixnums. I don't expect this to be super fast, as I didn't even declare optimize speed, but to test it I did locally download every release .tgz (~400MByte) and have it md5 the lot of them in about 7 seconds on sbcl on a 6 year old iMac.

(time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
Evaluation took:
  7.260 seconds of real time
  6.345531 seconds of total run time (5.771987 user, 0.573544 system)
  [ Run times consist of 0.041 seconds GC time, and 6.305 seconds non-GC time. ]
  87.41% CPU
  28,975,291,831 processor cycles
  522,484,592 bytes consed

One of the functions md5-seq could be used to feed the bytes during a http download as they arrive, which would likely bury any compute time in the process of quickloading a new system, should latency be something worthwhile to optimize here.

jesseoff commented 4 years ago

This is running ql-dist:check-local-archive-file on 1899 package archives, totalling 376M.

CCL v1.12: 20.5 seconds

Clozure Common Lisp Version 1.12  DarwinX8664

For more information about CCL, please see http://ccl.clozure.com.

CCL is free software.  It is distributed under the terms of the Apache
Licence, Version 2.0.
? (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
(DOLIST (X (QL-DIST:PROVIDED-RELEASES (CAR (QL-DIST:ENABLED-DISTS)))) (QL-DIST:CHECK-LOCAL-ARCHIVE-FILE X))
took 20,527,187 microseconds (20.527187 seconds) to run.
         50,954 microseconds ( 0.050954 seconds, 0.25%) of which was spent in GC.
During that period, and with 8 available CPU cores,
     18,785,078 microseconds (18.785078 seconds) were spent in user mode
        908,797 microseconds ( 0.908797 seconds) were spent in system mode
 526,313,584 bytes of memory allocated.
 622 minor page faults, 12 major page faults, 0 swaps.

CLISP v2.49: 487 seconds

[1]> (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
Real time: 487.09634 sec.
Run time: 485.27423 sec.
Space: 2867582144 Bytes
GC: 1538, GC time: 8.831983 sec.
NIL

ABCL v1.6.1 : 493 seconds

Armed Bear Common Lisp 1.6.1
Java 13.0.2 N/A
OpenJDK 64-Bit Server VM
Low-level initialization completed in 0.199 seconds.
Startup completed in 1.09 seconds.
Type ":help" for a list of available commands.
CL-USER(1): (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
493.581 seconds real time
706103756 cons cells
NIL

ECL v20.4.24 : 269 seconds

;;; Loading "/Users/joff/quicklisp/setup.lisp"
;;; Loading #P"/usr/local/Cellar/ecl/20.4.24/lib/ecl-20.4.24/asdf.fas"
ECL (Embeddable Common-Lisp) 20.4.24 (git:UNKNOWN)
Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya
Copyright (C) 1993 Giuseppe Attardi
Copyright (C) 2013 Juan J. Garcia-Ripoll
Copyright (C) 2018 Daniel Kochmanski
Copyright (C) 2020 Daniel Kochmanski and Marius Gerbershagen
ECL is free software, and you are welcome to redistribute it
under certain conditions; see file 'Copyright' for details.
Type :h for Help.  
Top level in: #<process TOP-LEVEL 0x10baedf80>.
> (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))

real time : 269.974 secs
run time  : 267.677 secs
gc count  : 123 times
consed    : 592999584 bytes
jesseoff commented 4 years ago

Since adding some declares and setting speed optimization, the time it takes to md5 check the entire quicklisp archive directory in sbcl has gone from 7.2 seconds to 2.7 seconds.

On my iMac, running a (md5-file "test") in sbcl takes .72 seconds whereas running the md5 command line utility takes .19 seconds.

 (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
Evaluation took:
  2.732 seconds of real time
  2.733517 seconds of total run time (2.454918 user, 0.278599 system)
  [ Run times consist of 0.035 seconds GC time, and 2.699 seconds non-GC time. ]
  100.07% CPU
  10,901,781,530 processor cycles
  522,330,080 bytes consed
mohe2015 commented 4 years ago

Just my two bits: I don't really like that quicklisp still uses md5 as it is really insecure.

quicklisp commented 4 years ago

MD5 is better than nothing for detecting accidental file corruption, but I agree that it is not suitable for detecting malicious tampering.

For detecting malicious tampering, I have a branch that uses OpenPGP-signed manifests of sha256 digests. Unfortunately it remains a work in progress.

jesseoff commented 4 years ago

Just my two bits: I don't really like that quicklisp still uses md5 as it is really insecure.

The hash works just fine if the primary purpose is not security, but something a little better than a file size and name check defense against corrupt downloads.

For security, I'd recommend using ed25519 signatures. It hashes with SHA512 then uses elliptic curve crypto to sign the hash. Quicklisp could have a defconstant public key to verify against. With a little massaging, the ironclad ed25519 implementation could be pulled out in isolation into a standalone .lisp.

jesseoff commented 4 years ago

Wanted to note that in the performance optimizations, I declared a 32-bit modulo add operation inline (mod32+). This results in 256 inline expansions of it in one function, which causes a warning on old SBCL versions and seems to trigger miscompilation on abcl. As a result, there is a abcl read-time conditional around the inline decl.