haskell-crypto / cryptonite

lowlevel set of cryptographic primitives for haskell
Other
226 stars 139 forks source link

4-8x sha256 performance deficit compared to sha256sum/openssl (SHA instructions) #361

Closed Atemu closed 2 years ago

Atemu commented 2 years ago

I made a minimal sha256sum rewrite out of cryptonite:

import System.Environment
import Crypto.Hash
import qualified Data.ByteString.Lazy as L

main = do
  args <- getArgs
  let file = head args

  content <- L.readFile file

  let digest = hashlazy content :: Digest SHA256

  putStrLn $ show digest ++ " " ++ file

However, it is 4-8 times slower than the regular sha256sum or openssl sha256 on machines with SHA accelerator. (Except on an M1 Pro where my coreutils' sha256sum also doesn't make use of SHA extensions but openssl does.)

Celeron J4105:

$ head -c 10G /dev/zero | pv | /tmp/sha256sum /dev/stdin
10.0GiB 0:01:42 [99.7MiB/s] [                  <=>                                                                                                                                                                   \
                                                                                                                                 ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d /dev/stdin

$ head -c 10G /dev/zero | pv | sha256sum /dev/stdin
10.0GiB 0:00:23 [ 443MiB/s] [                                                                                                                                                    <=>                                 \
                                                                                                                                 ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d  /dev/stdin

$ head -c 10G /dev/zero | pv | openssl sha256 /dev/stdin
10.0GiB 0:00:31 [ 324MiB/s] [                                                                                                                                                                                        \
              <=>                                                                                                                ]
SHA256(/dev/stdin)= 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d

M1 Pro:

$ head -c 10G /dev/zero | pv | /tmp/sha256sum /dev/stdin
10.0GiB 0:00:50 [ 202MiB/s] [                                                                                                                                                                                 <=>    ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d /dev/stdin

$ head -c 10G /dev/zero | pv | sha256sum /dev/stdin
10.0GiB 0:00:37 [ 271MiB/s] [                                                                                                                                         <=>                                            ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d  /dev/stdin

$ head -c 10G /dev/zero | pv | openssl sha256 /dev/stdin
10.0GiB 0:00:05 [1.74GiB/s] [                     <=>                                                                                                                                                                ]
SHA256(/dev/stdin)= 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d

5800x:

$ head -c 10G /dev/zero | pv | /tmp/sh265sum /dev/stdin
10.0GiB 0:00:33 [ 301MiB/s] [                                                                                             <=>                                             ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d /dev/stdin

$ head -c 10G /dev/zero | pv | sha256sum /dev/stdin
10.0GiB 0:00:06 [1.65GiB/s] [                   <=>                                                                                                                       ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d  /dev/stdin

$ head -c 10G /dev/zero | pv | openssl sha256 /dev/stdin
10.0GiB 0:00:09 [1.05GiB/s] [                           <=>                                                                                                               ]
SHA256(/dev/stdin)= 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d

(This is a minimal reproducer for a bug in git-annex: https://git-annex.branchable.com/bugs/git-annex_is_slow_at_reading_file_content/)

vincenthz commented 2 years ago

there's no plan to tap in SHA instructions at the moment nor SIMD, so that's an expected slowdown. someone will have to add the instructions support and the layers of compat and fallback for this to happen.

this is a lot of work all in all, I've did this in rust here: https://github.com/typed-io/cryptoxide/tree/master/src/hashing/sha2/impl256

Atemu commented 2 years ago

Thanks for the answer.

Couldn't the Rust implementation be used instead of the C one here?

Could cryptonite hook into existing implementations of all that complexity like openssl?