OMP scaling - Githubissues

jfoug commented 9 years ago

Here is an example (cygwin64, OMP build)

$ OMP_NUM_THREADS=8 ../run/john -test=5 -form=tc_ripemd160
Will run 8 OpenMP threads
Benchmarking: tc_ripemd160, TrueCrypt RIPEMD160 AES256_XTS [32/64]... (8xOMP) DONE
Raw:    4.8 c/s real, 4.5 c/s virtual

$ OMP_NUM_THREADS=1 ../run/john -test=5 -form=tc_ripemd160
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: tc_ripemd160, TrueCrypt RIPEMD160 AES256_XTS [32/64]... DONE
Raw:    39.4 c/s real, 39.7 c/s virtual

I have fixed one of these https://github.com/magnumripper/JohnTheRipper/commit/38aa13b3d9d1999be70973117d2f1a893190aaf5 But we need to identify them, and at least work around them (like this). It would be nice to know WHY, and if why, then try some way to auto-detect.

It likely (for the tc_* hashes on cygwin), that the PKCS5_PBKDF2_HMAC() oSSL function was made thread safe by simply putting a mutex around the call, OR worse, they put a mutex around some interal part of the call (such as the HMAC). The 2nd is more likely. So what happens is 8 threads start into PKCS5_PBKDF2_HMAC() call, then they all but one block and that one does an HMAC. That thread then is stopped, and the next thead gets it's chance to go. HORRIBLE performance!!

That is just a theory, but from dealing with poorly coded MT code before, it is what it looks like it is doing.

I will add some rel-bench runs where I cut out things that did not look quite right, and put them into a follow up post.

jfoug commented 9 years ago

This may not fit here 100%, but it is where i am putting it now. Here are benchmark tests run, against 3 runs, on cygwin64 and Ubuntu-64 (virtualbox VM), on my AMD xop dual laptop. There are some MAX_KEYS and OMP_SCALE things we should address here. I want to get the same data from my core-i7 quad HT at work, since it will likely scale differently. Then we can smooth some of these issues out, even before starting on auto-scaling for OMP.

FIXED:

Ratio:  0.15884 real, 0.13887 virtual   openssl-enc, OpenSSL "enc" encryption:Raw
Ratio:  1.00000 real, 0.25000 virtual   openssl-enc, OpenSSL "enc" encryption:Raw
This one was cygwin only.  Worked fine on Ubuntu.  I added a #ifdef __CYGWIN__ to turn off OMP

all tc_* (true crypt) hashes on cygwin

cygwin64 $ ../run/relbench -v omp-1-xop.log omp-0-xop.log
Ratio:  1.85919 real, 1.85881 virtual   Blockchain, My Wallet (x10):Raw
Ratio:  2.18651 real, 2.18615 virtual   EPiServer:Many salts
Ratio:  2.19742 real, 2.21316 virtual   EPiServer:Only one salt
Ratio:  1.34197 real, 1.34105 virtual   HMAC-MD5:Only one salt
Ratio:  1.33636 real, 1.33969 virtual   HMAC-SHA1:Only one salt
Ratio:  1.28210 real, 1.28137 virtual   HMAC-SHA224:Many salts
Ratio:  1.24982 real, 1.24792 virtual   HMAC-SHA512:Many salts
Ratio:  1.15183 real, 1.15139 virtual   HMAC-SHA512:Only one salt
Ratio:  1.35976 real, 1.35325 virtual   LM:Raw
Ratio:  0.89873 real, 0.89687 virtual   LastPass, sniffed sessions:Raw
Ratio:  1.96742 real, 1.96912 virtual   MongoDB, system / network:Raw
Ratio:  1.77983 real, 1.77802 virtual   PST, custom CRC-32:Raw
Ratio:  1.92509 real, 1.91920 virtual   RACF:Many salts
Ratio:  1.64315 real, 1.64180 virtual   RACF:Only one salt
Ratio:  1.31663 real, 1.31514 virtual   RAKP, IPMI 2.0 RAKP (RMCP+):Only one salt
Ratio:  0.87120 real, 0.86849 virtual   Raw-Blake2:Raw
Ratio:  0.80101 real, 0.80025 virtual   Raw-Keccak:Raw
Ratio:  0.86252 real, 0.86252 virtual   Raw-Keccak-256:Raw
Ratio:  1.19374 real, 1.18459 virtual   Raw-MD4:Raw
Ratio:  0.92907 real, 0.92695 virtual   Raw-SHA512:Raw
Ratio:  0.91581 real, 0.91350 virtual   Raw-SHA1-ng, (pwlen <= 15):Raw
Ratio:  1.15896 real, 1.16040 virtual   Raw-SHA512-ng:Raw
Ratio:  0.91991 real, 0.91628 virtual   SIP:Only one salt
Ratio:  0.90118 real, 0.90389 virtual   SSH (one 2048-bit RSA and one 1024-bit DSA key):Raw
Ratio:  4.09753 real, 4.09598 virtual   SSH-ng:Raw
Ratio:  0.94087 real, 0.94382 virtual   Snefru-128:Raw
Ratio:  0.84281 real, 0.84123 virtual   Snefru-256:Raw
Ratio:  0.92757 real, 0.92518 virtual   Sybase-PROP:Many salts
Ratio:  0.92937 real, 0.92280 virtual   Sybase-PROP:Only one salt
Ratio:  0.91833 real, 0.91758 virtual   Tiger:Raw
Ratio:  1.15613 real, 1.15681 virtual   agilekeychain, 1Password Agile Keychain:Raw
Ratio:  1.17803 real, 1.17727 virtual   bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations):Many salts
Ratio:  2.35986 real, 2.37017 virtual   chap, iSCSI CHAP authentication:Raw
Ratio:  1.18942 real, 1.18884 virtual   dragonfly3-32, DragonFly BSD $3$ w/ bug, 32-bit:Many salts
Ratio:  1.21620 real, 1.21620 virtual   dragonfly3-32, DragonFly BSD $3$ w/ bug, 32-bit:Only one salt
Ratio:  1.30824 real, 1.30745 virtual   dragonfly3-64, DragonFly BSD $3$ w/ bug, 64-bit:Many salts
Ratio:  1.18535 real, 1.18331 virtual   dragonfly3-64, DragonFly BSD $3$ w/ bug, 64-bit:Only one salt
Ratio:  1.16511 real, 1.16313 virtual   dragonfly4-32, DragonFly BSD $4$ w/ bugs, 32-bit:Many salts
Ratio:  1.16561 real, 1.16455 virtual   dragonfly4-64, DragonFly BSD $4$ w/ bugs, 64-bit:Only one salt
Ratio:  1.15637 real, 1.16207 virtual   fde, Android FDE:Raw
Ratio:  1.11659 real, 1.11406 virtual   hdaa, HTTP Digest access authentication:Many salts
Ratio:  1.10031 real, 1.10060 virtual   ipb2, Invision Power Board 2.x:Many salts
Ratio:  1.11996 real, 1.12052 virtual   keychain, Mac OS X Keychain:Raw
Ratio:  1.21872 real, 1.21897 virtual   keystore, Java KeyStore:Raw
Ratio:  1.14208 real, 1.13958 virtual   lotus85, Lotus Notes/Domino 8.5:Raw
Ratio:  1.71362 real, 1.71173 virtual   mscash, MS Cache Hash (DCC):Many salts
Ratio:  1.54250 real, 1.53922 virtual   mscash, MS Cache Hash (DCC):Only one salt
Ratio:  1.31895 real, 1.31614 virtual   mschapv2-naive, MSCHAPv2 C/R:Many salts
Ratio:  1.54376 real, 1.54425 virtual   mssql12, MS SQL 2012/2014:Many salts
Ratio:  1.32595 real, 1.32425 virtual   mssql12, MS SQL 2012/2014:Only one salt
Ratio:  1.59822 real, 1.60298 virtual   mysqlna, MySQL Network Authentication:Raw
Ratio:  1.29011 real, 1.28920 virtual   net-md5, "Keyed MD5" RIPv2, OSPF, BGP, SNMPv2:Many salts
Ratio:  1.22309 real, 1.22330 virtual   net-md5, "Keyed MD5" RIPv2, OSPF, BGP, SNMPv2:Only one salt
Ratio:  1.39862 real, 1.39617 virtual   net-sha1, "Keyed SHA1" BFD:Many salts
Ratio:  1.43371 real, 1.43077 virtual   net-sha1, "Keyed SHA1" BFD:Only one salt
Ratio:  1.16528 real, 1.16641 virtual   nethalflm, HalfLM C/R:Many salts
Ratio:  1.22222 real, 1.22358 virtual   netlm, LM C/R:Many salts
Ratio:  1.18733 real, 1.18456 virtual   netlm, LM C/R:Only one salt
Ratio:  1.30871 real, 1.31416 virtual   netntlm-naive, NTLMv1 C/R:Many salts
Ratio:  1.15936 real, 1.15877 virtual   netntlmv2, NTLMv2 C/R:Many salts
Ratio:  1.37253 real, 1.36914 virtual   nt2, NT:Raw
Ratio:  1.16799 real, 1.16840 virtual   oldoffice, MS Office <= 2003:Many salts
Ratio:  1.18387 real, 1.18222 virtual   openssl-enc, OpenSSL "enc" encryption:Raw
Ratio:  1.70178 real, 1.69890 virtual   postgres, PostgreSQL C/R:Raw
Ratio:  0.91070 real, 0.90869 virtual   ripemd-128, RIPEMD 128:Raw
Ratio:  0.94408 real, 0.94032 virtual   ripemd-160, RIPEMD 160:Raw
Ratio:  0.87103 real, 0.86959 virtual   skein-256, Skein 256:Raw
Ratio:  0.88118 real, 0.87953 virtual   skein-512, Skein 512:Raw
Ratio:  0.90698 real, 0.89862 virtual   tc_ripemd160, TrueCrypt RIPEMD160 AES256_XTS:Raw
Ratio:  0.88230 real, 0.88230 virtual   tc_sha512, TrueCrypt SHA512 AES256_XTS:Raw
Ratio:  0.89068 real, 0.88782 virtual   tc_whirlpool, TrueCrypt WHIRLPOOL AES256_XTS:Raw
Ratio:  0.95394 real, 0.95651 virtual   tcp-md5, TCP MD5 Signatures, BGP:Many salts
Ratio:  0.93251 real, 0.93166 virtual   tcp-md5, TCP MD5 Signatures, BGP:Only one salt
Ratio:  1.15416 real, 1.15516 virtual   tripcode:Raw
Ratio:  0.88678 real, 0.88869 virtual   vtp, "MD5 based authentication" VTP:Many salts
Ratio:  0.90883 real, 0.90864 virtual   vtp, "MD5 based authentication" VTP:Only one salt
Ratio:  0.89586 real, 0.89650 virtual   whirlpool1:Raw
Ratio:  1.34711 real, 1.34591 virtual   xsha, Mac OS X 10.4 - 10.6:Many salts
Ratio:  1.25367 real, 1.25248 virtual   xsha, Mac OS X 10.4 - 10.6:Only one salt
Ratio:  0.91102 real, 0.90596 virtual   xsha512, Mac OS X 10.7:Only one salt

cygwin64 $ ../run/relbench -v omp-4-xop.log omp-0-xop.log
Ratio:  1.11980 real, 2.83895 virtual   Blockchain, My Wallet (x10):Raw
Ratio:  1.50262 real, 3.07782 virtual   EPiServer:Many salts
Ratio:  1.43417 real, 3.09400 virtual   EPiServer:Only one salt
Ratio:  1.03194 real, 1.99459 virtual   HMAC-SHA1:Only one salt
Ratio:  1.07199 real, 2.05080 virtual   LM:Raw
Ratio:  1.15328 real, 2.61449 virtual   MongoDB, system / network:Raw
Ratio:  1.08891 real, 2.30125 virtual   PFX, PKCS12 (.pfx, .p12):Raw
Ratio:  1.28908 real, 1.89985 virtual   PST, custom CRC-32:Raw
Ratio:  1.04591 real, 2.40295 virtual   RACF:Many salts
Ratio:  1.06827 real, 2.23486 virtual   RACF:Only one salt
Ratio:  1.02438 real, 1.02400 virtual   SSH-ng:Raw
Ratio:  1.61978 real, 3.48579 virtual   chap, iSCSI CHAP authentication:Raw
Ratio:  1.26333 real, 2.29408 virtual   cq, ClearQuest:Raw
Ratio:  1.08666 real, 2.48443 virtual   mscash, MS Cache Hash (DCC):Many salts
Ratio:  0.95866 real, 2.10208 virtual   mscash, MS Cache Hash (DCC):Only one salt
Ratio:  0.95244 real, 1.89957 virtual   net-sha1, "Keyed SHA1" BFD:Only one salt
Ratio:  1.02455 real, 2.39711 virtual   postgres, PostgreSQL C/R:Raw

$ ../run/relbench -v omp-1-u64-vm-sse41.log omp-4-u64-vm-sse41.log
Ratio:  0.53856 real, 0.13893 virtual   CRC32:Only one salt
Ratio:  1.01620 real, 0.25885 virtual   Citrix_NS10, Netscaler 10:Only one salt
Ratio:  0.70132 real, 0.25495 virtual   Fortigate, FortiOS:Many salts
Ratio:  0.44795 real, 0.16514 virtual   Fortigate, FortiOS:Only one salt
Ratio:  0.65770 real, 0.24255 virtual   HAVAL-128-4:Raw
Ratio:  0.58542 real, 0.21491 virtual   HAVAL-256-3:Raw
Ratio:  0.74744 real, 0.27421 virtual   HMAC-MD5:Many salts
Ratio:  0.63618 real, 0.24241 virtual   HMAC-MD5:Only one salt
Ratio:  0.72531 real, 0.27220 virtual   HMAC-SHA1:Many salts
Ratio:  0.56471 real, 0.21481 virtual   HMAC-SHA1:Only one salt
Ratio:  0.44187 real, 0.11733 virtual   LM:Raw
Ratio:  0.56230 real, 0.14567 virtual   PST, custom CRC-32:Raw
Ratio:  0.58217 real, 0.14959 virtual   Raw-MD4:Raw
Ratio:  0.72124 real, 0.18656 virtual   Raw-MD5:Raw
Ratio:  1.05441 real, 0.27153 virtual   Raw-SHA1:Raw
Ratio:  0.89824 real, 0.24002 virtual   Raw-SHA256-ng:Raw
Ratio:  0.89948 real, 0.32948 virtual   gost, GOST R 34.11-94:Raw
Ratio:  0.76972 real, 0.28384 virtual   hdaa, HTTP Digest access authentication:Many salts
Ratio:  0.75825 real, 0.28126 virtual   hdaa, HTTP Digest access authentication:Only one salt
Ratio:  1.00578 real, 0.26544 virtual   mschapv2-naive, MSCHAPv2 C/R:Only one salt
Ratio:  1.00074 real, 0.29016 virtual   nethalflm, HalfLM C/R:Only one salt
Ratio:  0.95518 real, 0.37010 virtual   netlm, LM C/R:Only one salt
Ratio:  0.99599 real, 0.25135 virtual   xsha, Mac OS X 10.4 - 10.6:Only one salt

../run/relbench -v omp-1-u64-vm-sse41.log omp-0-u64-vm-sse41.log
Ratio:  0.88462 real, 0.91026 virtual   7z, 7-Zip (512K iterations):Raw
Ratio:  0.76471 real, 0.81176 virtual   Bitcoin:Raw
Ratio:  0.86384 real, 0.90940 virtual   Blockchain, My Wallet (x10):Raw
Ratio:  0.88777 real, 0.92107 virtual   CRC32:Many salts
Ratio:  0.88885 real, 0.90889 virtual   CRC32:Only one salt
Ratio:  0.89803 real, 0.92203 virtual   Clipperz, SRP:Raw
Ratio:  0.97037 real, 0.97778 virtual   EncFS:Raw
Ratio:  0.98929 real, 0.98735 virtual   Fortigate, FortiOS:Many salts
Ratio:  0.97571 real, 0.97976 virtual   Fortigate, FortiOS:Only one salt
Ratio:  0.83792 real, 0.83980 virtual   HAVAL-128-4:Raw
Ratio:  0.87919 real, 0.88427 virtual   HAVAL-256-3:Raw
Ratio:  1.33038 real, 1.33026 virtual   HMAC-MD5:Only one salt
Ratio:  1.14986 real, 1.15024 virtual   HMAC-SHA1:Only one salt
Ratio:  1.16423 real, 1.16890 virtual   HMAC-SHA512:Many salts
Ratio:  1.19083 real, 1.19560 virtual   HMAC-SHA512:Only one salt
Ratio:  0.76804 real, 0.76651 virtual   IKE, PSK:Raw
Ratio:  0.88045 real, 0.92872 virtual   LM:Raw
Ratio:  0.89655 real, 0.89655 virtual   LUKS:Raw
Ratio:  0.81342 real, 0.83502 virtual   LastPass, sniffed sessions:Raw
Ratio:  0.86168 real, 0.85949 virtual   Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES:Raw
Ratio:  0.86985 real, 0.90815 virtual   OpenVMS, Purdy:Raw
Ratio:  0.87207 real, 0.91026 virtual   PBKDF2-HMAC-SHA1:Raw
Ratio:  0.72477 real, 0.74429 virtual   PBKDF2-HMAC-SHA256, rounds=12000:Raw
Ratio:  0.84516 real, 0.85806 virtual   PBKDF2-HMAC-SHA512, GRUB2 / OS X 10.8+:Raw
Ratio:  0.87224 real, 0.89544 virtual   PFX, PKCS12 (.pfx, .p12):Raw
Ratio:  0.88536 real, 0.91468 virtual   PKZIP:Only one salt
Ratio:  1.25522 real, 1.32128 virtual   PST, custom CRC-32:Raw
Ratio:  0.85991 real, 0.87929 virtual   Panama:Raw
Ratio:  1.27746 real, 1.35625 virtual   PuTTY, Private Key:Raw
Ratio:  0.85978 real, 0.90130 virtual   RACF:Only one salt
Ratio:  0.87630 real, 0.88879 virtual   RAKP, IPMI 2.0 RAKP (RMCP+):Many salts
Ratio:  0.86953 real, 0.90583 virtual   Raw-Keccak:Raw
Ratio:  0.85778 real, 0.92249 virtual   Raw-Keccak-256:Raw
Ratio:  0.82277 real, 0.88470 virtual   Raw-MD4:Raw
Ratio:  0.74434 real, 0.81807 virtual   Raw-SHA224:Raw
Ratio:  0.75153 real, 0.78938 virtual   Raw-SHA256:Raw
Ratio:  0.76071 real, 0.79268 virtual   Raw-SHA384:Raw
Ratio:  0.86875 real, 0.93477 virtual   Raw-SHA512:Raw
Ratio:  0.78627 real, 0.81560 virtual   Raw-SHA1-ng, (pwlen <= 15):Raw
Ratio:  0.78804 real, 0.85467 virtual   Raw-SHA256-ng:Raw
Ratio:  0.83193 real, 0.87252 virtual   Raw-SHA512-ng:Raw
Ratio:  0.84257 real, 0.88728 virtual   SIP:Many salts
Ratio:  0.84481 real, 0.86964 virtual   SIP:Only one salt
Ratio:  0.66253 real, 0.67333 virtual   SSH (one 2048-bit RSA and one 1024-bit DSA key):Raw
Ratio:  0.68603 real, 0.70730 virtual   SSH-ng:Raw
Ratio:  0.86412 real, 0.90380 virtual   SSHA512, LDAP:Many salts
Ratio:  0.83089 real, 0.87967 virtual   SSHA512, LDAP:Only one salt
Ratio:  0.88188 real, 0.91276 virtual   STRIP, Password Manager:Raw
Ratio:  0.89958 real, 0.93912 virtual   Salted-SHA1:Many salts
Ratio:  0.89833 real, 0.93783 virtual   Salted-SHA1:Only one salt
Ratio:  0.82300 real, 0.83471 virtual   Siemens-S7:Many salts
Ratio:  0.84723 real, 0.86098 virtual   Siemens-S7:Only one salt
Ratio:  0.78742 real, 0.82366 virtual   Snefru-256:Raw
Ratio:  0.86935 real, 0.88899 virtual   Sybase-PROP:Many salts
Ratio:  0.71526 real, 0.77302 virtual   Tiger:Raw
Ratio:  0.86037 real, 0.88894 virtual   WoWSRP, Battlenet:Raw
Ratio:  0.82293 real, 0.89155 virtual   ZIP, WinZip:Raw
Ratio:  0.85227 real, 0.91822 virtual   agilekeychain, 1Password Agile Keychain:Raw
Ratio:  0.79779 real, 0.81061 virtual   aix-smd5, AIX LPA {smd5} (modified crypt-md5):Raw
Ratio:  0.84565 real, 0.92540 virtual   aix-ssha1, AIX LPA {ssha1}:Raw
Ratio:  0.78509 real, 0.83887 virtual   aix-ssha256, AIX LPA {ssha256}:Raw
Ratio:  0.79376 real, 0.85736 virtual   aix-ssha512, AIX LPA {ssha512}:Raw
Ratio:  0.83851 real, 0.87771 virtual   bcrypt ("$2a$05", 32 iterations):Raw
Ratio:  0.83942 real, 0.88749 virtual   blackberry-es10:Raw
Ratio:  0.82927 real, 0.82927 virtual   cloudkeychain, 1Password Cloud Keychain:Raw
Ratio:  0.47893 real, 0.49174 virtual   cq, ClearQuest:Raw
Ratio:  0.88643 real, 0.93901 virtual   crypt, generic crypt(3) DES:Many salts
Ratio:  0.85718 real, 0.92184 virtual   crypt, generic crypt(3) DES:Only one salt
Ratio:  0.77505 real, 0.77550 virtual   hsrp, "MD5 authentication" HSRP, VRRP, GLBP:Only one salt
Ratio:  0.89531 real, 0.89530 virtual   krb5pa-md5, Kerberos 5 AS-REQ Pre-Auth etype 23:Many salts
Ratio:  0.88705 real, 0.88705 virtual   krb5pa-md5, Kerberos 5 AS-REQ Pre-Auth etype 23:Only one salt
Ratio:  0.81316 real, 0.87826 virtual   md5crypt, crypt(3) $1$:Raw
Ratio:  0.65473 real, 0.65602 virtual   mscash, MS Cache Hash (DCC):Many salts
Ratio:  0.80075 real, 0.79918 virtual   mscash, MS Cache Hash (DCC):Only one salt
Ratio:  0.87783 real, 0.87429 virtual   mssql12, MS SQL 2012/2014:Many salts
Ratio:  0.89158 real, 0.89324 virtual   mssql12, MS SQL 2012/2014:Only one salt
Ratio:  1.26422 real, 1.27191 virtual   net-md5, "Keyed MD5" RIPv2, OSPF, BGP, SNMPv2:Many salts
Ratio:  1.19603 real, 1.19560 virtual   net-md5, "Keyed MD5" RIPv2, OSPF, BGP, SNMPv2:Only one salt
Ratio:  1.30800 real, 1.31040 virtual   net-sha1, "Keyed SHA1" BFD:Many salts
Ratio:  1.25947 real, 1.25447 virtual   net-sha1, "Keyed SHA1" BFD:Only one salt
Ratio:  0.84865 real, 0.84696 virtual   netntlmv2, NTLMv2 C/R:Only one salt
Ratio:  0.86876 real, 0.90692 virtual   oldoffice, MS Office <= 2003:Only one salt
Ratio:  0.82383 real, 0.86002 virtual   openssl-enc, OpenSSL "enc" encryption:Raw
Ratio:  0.82195 real, 0.86364 virtual   rar, RAR3 (4 characters):Raw
Ratio:  0.72014 real, 0.74109 virtual   ripemd-128, RIPEMD 128:Raw
Ratio:  0.74238 real, 0.74691 virtual   ripemd-160, RIPEMD 160:Raw
Ratio:  0.71129 real, 0.72288 virtual   rsvp, HMAC-MD5 / HMAC-SHA1, RSVP, IS-IS:Many salts
Ratio:  0.78651 real, 0.79767 virtual   rsvp, HMAC-MD5 / HMAC-SHA1, RSVP, IS-IS:Only one salt
Ratio:  0.88927 real, 0.95034 virtual   sapb, SAP CODVN B (BCODE):Many salts
Ratio:  0.88227 real, 0.94267 virtual   sapb, SAP CODVN B (BCODE):Only one salt
Ratio:  0.88743 real, 0.91268 virtual   sapg, SAP CODVN F/G (PASSCODE):Only one salt
Ratio:  0.80841 real, 0.84077 virtual   skein-256, Skein 256:Raw
Ratio:  0.80699 real, 0.84430 virtual   skein-512, Skein 512:Raw
Ratio:  0.88449 real, 0.90813 virtual   sxc, StarOffice .sxc:Raw
Ratio:  0.84594 real, 0.87571 virtual   sybasease, Sybase ASE:Many salts
Ratio:  0.84086 real, 0.88334 virtual   sybasease, Sybase ASE:Only one salt
Ratio:  0.81169 real, 0.87379 virtual   tc_ripemd160, TrueCrypt RIPEMD160 AES256_XTS:Raw
Ratio:  0.82677 real, 0.89764 virtual   tc_sha512, TrueCrypt SHA512 AES256_XTS:Raw
Ratio:  0.82540 real, 0.86825 virtual   tc_whirlpool, TrueCrypt WHIRLPOOL AES256_XTS:Raw
Ratio:  0.80706 real, 0.84420 virtual   tcp-md5, TCP MD5 Signatures, BGP:Many salts
Ratio:  0.79632 real, 0.85431 virtual   tcp-md5, TCP MD5 Signatures, BGP:Only one salt
Ratio:  0.87893 real, 0.88763 virtual   vtp, "MD5 based authentication" VTP:Only one salt
Ratio:  0.73033 real, 0.74832 virtual   wbb3, WoltLab BB3:Raw
Ratio:  0.80433 real, 0.85773 virtual   whirlpool:Raw
Ratio:  0.79311 real, 0.84194 virtual   whirlpool0:Raw
Ratio:  0.80481 real, 0.83660 virtual   whirlpool1:Raw
Ratio:  0.86097 real, 0.87625 virtual   wpapsk, WPA/WPA2 PSK:Raw
Ratio:  0.78116 real, 0.82060 virtual   xsha, Mac OS X 10.4 - 10.6:Many salts
Ratio:  0.68642 real, 0.73651 virtual   xsha, Mac OS X 10.4 - 10.6:Only one salt
Ratio:  0.75988 real, 0.81722 virtual   xsha512, Mac OS X 10.7:Many salts
Ratio:  0.79821 real, 0.87032 virtual   xsha512, Mac OS X 10.7:Only one salt

../run/relbench -v omp-4-u64-vm-sse41.log omp-0-u64-vm-sse41.log
Ratio:  1.65040 real, 6.54213 virtual   CRC32:Only one salt
Ratio:  1.41061 real, 3.87267 virtual   Fortigate, FortiOS:Many salts
Ratio:  2.17818 real, 5.93296 virtual   Fortigate, FortiOS:Only one salt
Ratio:  1.27402 real, 3.46236 virtual   HAVAL-128-4:Raw
Ratio:  1.50181 real, 4.11467 virtual   HAVAL-256-3:Raw
Ratio:  1.49367 real, 4.07146 virtual   HMAC-MD5:Many salts
Ratio:  2.09120 real, 5.48767 virtual   HMAC-MD5:Only one salt
Ratio:  1.28490 real, 3.43064 virtual   HMAC-SHA1:Many salts
Ratio:  2.03621 real, 5.35467 virtual   HMAC-SHA1:Only one salt
Ratio:  1.99254 real, 7.91541 virtual   LM:Raw
Ratio:  2.23229 real, 9.07040 virtual   PST, custom CRC-32:Raw
Ratio:  1.41327 real, 5.91417 virtual   Raw-MD4:Raw
Ratio:  1.23823 real, 5.10423 virtual   Raw-MD5:Raw
Ratio:  1.08677 real, 2.96098 virtual   gost, GOST R 34.11-94:Raw
Ratio:  1.31823 real, 3.57481 virtual   hdaa, HTTP Digest access authentication:Many salts
Ratio:  1.35650 real, 3.66392 virtual   hdaa, HTTP Digest access authentication:Only one salt
Ratio:  1.05858 real, 4.19895 virtual   net-md5, "Keyed MD5" RIPv2, OSPF, BGP, SNMPv2:Only one salt
Ratio:  1.15457 real, 4.51186 virtual   net-sha1, "Keyed SHA1" BFD:Only one salt

jfoug commented 9 years ago

Original timings using PKCS5_PBKDF2_HMAC() if truecrypt:

$ OMP_NUM_THREADS=1 ../run/john -test=5 -form=tc_sha512
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [64/64]... DONE
Raw:    204 c/s real, 208 c/s virtual

$ OMP_NUM_THREADS=8 ../run/john -test=5 -form=tc_sha512
Will run 8 OpenMP threads
Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [64/64]... (8xOMP) DONE
Raw:    31.6 c/s real, 28.8 c/s virtual

New timings using pbkdf2_sha512 (not even to mention we CAN do this with SIMD)

$ OMP_NUM_THREADS=1 ../run/john -test=5 -form=tc_sha512
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [64/64]... DONE
Raw:    352 c/s real, 358 c/s virtual

$ OMP_NUM_THREADS=8 ../run/john -test=5 -form=tc_sha512
Will run 8 OpenMP threads
Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [64/64]... (8xOMP) DONE
Raw:    1342 c/s real, 187 c/s virtual

This is really a NO BRAINER!!!

I was also able to get ripemd160 working instantly in pass_gen.pl by passing &ripemd160 paramter to pp_pbkdf2 (and 2000 iterations). It works like a champ. I have to build a pbkdf2_hmac_ripemd160 for this.

However, whirlpool did NOT give me the same results for the pbkdf2 in pass_gen.pl. I do not know why but it did not.

jfoug commented 9 years ago

Now:

$ OMP_NUM_THREADS=1 ../run/john -test=5 -form=tc_sha512
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [128/128 SSE4.1 2x]... DONE
Raw:    641 c/s real, 647 c/s virtual

$ OMP_NUM_THREADS=8 ../run/john -test=5 -form=tc_sha512
Will run 8 OpenMP threads
Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [128/128 SSE4.1 2x]... (8xOMP) DONE
Raw:    2439 c/s real, 343 c/s virtual

I have it ready to go. ONLY sha512 is overridden (for now). I do not have a pbkdf2 done for ripemd160 (yet). It should not be hard however, and THAT hash is very slow on my machine, so there may be a LOT to gain.

5c36dfb

magnumripper commented 9 years ago

Good stuff!

jfoug commented 9 years ago

i now have pbkdf2-hmac-whirlpool.h, and pbkdf2-hmac-ripemd160.h into JtR code, and included and used in trueccrypt_fmt_plug.c whirlpool only got about 60% speedup, but ripemd got about 3x or so improvement.

jfoug commented 9 years ago

oSSL also contains RIPEMD160. I added code to configure to autodetect. There is now HAVE_RIPEMD160 added into autoconfig.h I also put code in to use this within pbkdf2_hmac_ripemd160.h (used by truecrypt_fmt). Found out on my 64 byte cygwin system, it is about 10% slower than the sph code. It is almost certain that oSSL is using the same reference-level RipeMD that we are doing, but 'properly' based on the build being 64 bits (such as bit count accumulation in a single var). oSSL may be doing it using 2 vars or something (which is slower). Well whatever it is, I have commented it out for now (with an #if HAVE_RIPEMD160 && 0 ) I might have a look on other systems. I bet that oSSL on 32 bit will be faster than sph, but I could be wrong in that assumption also.

jfoug commented 9 years ago

Here is a breakdown of oSSL and sph_* for ripemd160 and whirlpool for 32 bit and 64 bit systems I have. I think this shows that running oSSL for whirlpool (if supported) is best, or almost even, and that running sph_ripemd160 is best or almost even, for all systems.

OS	algo	oSSL	shp_xx
Cygwin32	ripemd160	135	153**
Cygwin64	ripemd160	152	150
Ubuntu32	ripemd160	91.6	91.6
Ubuntu64	ripemd160	142	163**
Cygwin32	whirlpool	229***	63.5
Cygwin64	whirlpool	230	227
Ubuntu32	whirlpool	133**	34.8
Ubuntu64	whirlpool	190**	145

frank-dittrich commented 9 years ago

Are you running ubuntu inside a VM, on different hardware, or why are the numbers so poor (compared to cygwin)? Can you post the patches needed to reproduce this? I#d like to try it on 32bit and 64bit Fedora.

jfoug commented 9 years ago

In a VM. That 32 bit VM only have 2 cores. All other tests had 4 (real CPU is DUAL-HT)

jfoug commented 9 years ago

Btw, here are the new speeds after the latest commit of this format, cygwin64 speeds only, but on same hardware as the above table

$ ../run/john -test=5 -format=tc_* Will run 4 OpenMP threads Benchmarking: tc_aes_xts, TrueCrypt (RIPEMD160/SHA512/WHIRLPOOL) AES256_XTS [128/128 XOP 2x]... (4xOMP) DONE Raw: 2361 c/s real, 1223 c/s virtual

Benchmarking: tc_ripemd160, TrueCrypt RIPEMD160 AES256_XTS [32/64]... (4xOMP) DONE Raw: 469 c/s real, 124 c/s virtual

Benchmarking: tc_sha512, TrueCrypt SHA512 AES256_XTS [128/128 XOP 2x]... (4xOMP) DONE Raw: 3718 c/s real, 994 c/s virtual

Benchmarking: tc_whirlpool, TrueCrypt WHIRLPOOL AES256_XTS [64/64]... (4xOMP) DONE Raw: 709 c/s real, 189 c/s virtual

All 4 formats passed self-tests!

magnumripper commented 9 years ago

That's beautiful for the relbench figures :) TC did exist in J7, didn't it? Heck it was so long ago I do not really know.

jfoug commented 9 years ago

I am pretty sure TC has been there a bit. It is 10 to 20x faster now. (or on my cygwin omp build, about 4000x faster for sha512, LOL).

Now I gotta get krb5-18 done. That one should also get a huge improvement if we can get past the libkrb5 stuff.

That actually is not a bad idea. Identifying the hashes that use slow high level libs, but where the hash was present in J7, 'fixing' them to be faster native, or even SIMD, and fudging the relbench numbers :)

jfoug commented 9 years ago

Btw, I have the TC_* hashes in pass_gen.pl now, so I can also add them to jtrts.pl I have added them to the 'add missing hashes ' issue on jtrts.

jfoug commented 9 years ago

krb5-18 is now done. This was the last super bad one on cygwin (where OMP was SLOWER by far, than OMP=1)

magnumripper commented 9 years ago

Benchmark:

$ ../run/john-no-omp -test -form=cpu | tee no-omp.txt
$ OMP_NUM_THREADS=1 ../run/john-omp -test -form=cpu | tee omp1.txt
$ OMP_NUM_THREADS=4 ../run/john-omp -test -form=cpu | tee omp4.txt

First I thought the below said phpass should run larger batches in non-omp builds:

$ ../run/relbench -v no-omp.txt omp1.txt | grep ^Ratio | sort -k2,2nr | head
Ratio:  6.16163 real, 6.10060 virtual   phpass ($P$9):Raw
Ratio:  6.01463 real, 6.13602 virtual   dynamic_17:Raw
Ratio:  1.77308 real, 1.73814 virtual   cq, ClearQuest:Raw
Ratio:  1.26606 real, 1.24109 virtual   xsha, Mac OS X 10.4 - 10.6:Only one salt
Ratio:  1.24405 real, 1.23173 virtual   Raw-MD5:Raw
Ratio:  1.22452 real, 1.21230 virtual   nt2, NT:Raw
Ratio:  1.21681 real, 1.20477 virtual   Raw-SHA1:Raw
Ratio:  1.20770 real, 1.21976 virtual   Raw-MD4:Raw
Ratio:  1.17541 real, 1.17541 virtual   Citrix_NS10, Netscaler 10:Only one salt
Ratio:  1.16873 real, 1.14574 virtual   HMAC-SHA1:Many salts

Then I saw this

$ grep -A1 phpass no-omp.txt omp1.txt
no-omp.txt:Benchmarking: dynamic_17 [phpass ($P$ or $H$) 32/64 1x2  (MD5_body)]... DONE
no-omp.txt-Raw: 3690 c/s real, 3617 c/s virtual
--
no-omp.txt:Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 32/64 1x2  (MD5_body)]... DONE
no-omp.txt-Raw: 3638 c/s real, 3638 c/s virtual
--
omp1.txt:Benchmarking: dynamic_17 [phpass ($P$ or $H$) 128/128 AVX 4x4x3]... DONE
omp1.txt-Raw:   22194 c/s real, 22194 c/s virtual
--
omp1.txt:Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 AVX 4x4x3]... DONE
omp1.txt-Raw:   22416 c/s real, 22194 c/s virtual

Why is a non-omp build using MD5_body!? It's the same for dynamic_17.

magnumripper commented 9 years ago

Very poor figures below. Ideally it should be 4.0

$ grep -A2 4xOMP omp4.txt | sed 's/^--$//' > only_omp.txt
$ ../run/relbench -v no-omp.txt only_omp.txt | grep ^Ratio | sort -k2,2n | head 
Ratio:  1.00828 real, 0.99830 virtual   crypt, generic crypt(3) DES:Many salts
Ratio:  1.02357 real, 0.77956 virtual   PST, custom CRC-32:Raw
Ratio:  1.02592 real, 1.03618 virtual   crypt, generic crypt(3) DES:Only one salt
Ratio:  1.03944 real, 0.46869 virtual   EPiServer:Many salts
Ratio:  1.04432 real, 0.50199 virtual   EPiServer:Only one salt
Ratio:  1.04622 real, 0.46495 virtual   chap, iSCSI CHAP authentication:Raw
Ratio:  1.04902 real, 0.79894 virtual   dynamic_2006:Only one salt
Ratio:  1.08379 real, 0.78540 virtual   dynamic_2009:Only one salt
Ratio:  1.11291 real, 0.76219 virtual   dynamic_2014:Only one salt
Ratio:  1.12326 real, 0.81991 virtual   dynamic_1003:Raw

The list is a lot longer and doesn't reach 2.0 (50% efficiency) until line 90 out of 511.

jfoug commented 9 years ago

Phpass should be fixed. Yes, this was a bug, with #define's mixed up. The #define block was a little complex (like 4 different things going on, between OMP, SIMD, BE and MD5_X2. Thank god for syntax highlighting and smart #ifdef highlighting.

magnumripper commented 9 years ago

Fixed the few worst. cq got some awsome speedup even without OMP: 50337K -> 90725K and OMPx4 60633K -> 258998K.

magnumripper commented 9 years ago

I bet we have 30 or so formats that could benefit a lot from OMP_SCALE tuning, among at least the double number to test.

jfoug commented 9 years ago

Made good progress today: Still some to go (since my auto code only works for formats if they do a max *= (scale*omp_t) and many formats do not do that I do have a list of other formats that 'look' like we could do some better work.

Some of these may not benefit. Some may not have OMP at all. These are just ones that mostly appear like they may have benefits we can give them, OR when trying to kick up OMP_SCALE, it kept a fixed size. I just did IPB2 to get an idea of how best to attack some of these formats that do not allow the external upward adjustment of OMP_SCALE env var.

[x] md5ns Looks like no OMP made into a thin format, dyna_2004 2-3x improvement + OMP
[x] hMailServer (converted to thin dyna_61. 8x faster on my 2CPU-HT using SIMD)
[X] mscash1
[X] mscash2
[ ] cryptsha512_fmt_plug.c
[ ] dragonfly3_fmt_plug.c
[ ] dragonfly4_fmt_plug.c
[ ] mssql (needs OMP)
[ ] mssql05 (needs OMP)
[ ] mysql-sha1 (needs OMP)
[ ] nsldap (needs OMP)
[ ] oracle11 (needs OMP)
[ ] po (needs OMP)
[ ] Raw-SHA (needs OMP)
[ ] Raw-SHA1-linkedin (needs OMP)
[x] sapb
[x] sapg
[ ] skey (needs OMP)
[x] xsha512
[x] Raw-Blake2 not enough gain to matter
[x] Raw-Keccak not enough gain to matter
[x] Raw-Keccak-256 not enough gain to matter
[x] Raw-MD4 not enough gain to matter
[x] Raw-MD5 not enough gain to matter
[x] Raw-MD5u not enough gain to matter
[x] Raw-SHA1 not enough gain to matter
[x] Raw-SHA224 not enough gain to matter
[x] Raw-SHA256 not enough gain to matter
[x] Raw-SHA256-ng not enough gain to matter
[x] Raw-SHA384 not enough gain to matter
[x] Raw-SHA512 not enough gain to matter
[x] Raw-SHA512-ng not enough gain to matter
[x] Salted-SHA1 not enough gain to matter

magnumripper commented 9 years ago

To make that a checklist you need to start each line with dash

- [ ] like this

[ ] like this

magnumripper commented 9 years ago

md4gen is an old thing made by Solar (but not in core iirc). I use to ignore it because Dynamic obsoletes it. Actually we could drop it IMHO.

magnumripper commented 9 years ago

mssql and mssql05 does not support OMP. They probably should.

$ OMP_SCALE=2 OMP_NUM_THREADS=4 ../run/john -test -form:mssql
Benchmarking: mssql, MS SQL [SHA1 128/128 AVX 8x]... DONE
Many salts: 20360K c/s real, 20564K c/s virtual
Only one salt:  10540K c/s real, 10540K c/s virtual

Your script could detect that we did not get the (4xOMP) so no need to bother

magnumripper commented 9 years ago

NT does not and can not support OMP.

jfoug commented 9 years ago

I have pulled md4-gen and NT. I probably should also pull sha1-gen It is also somewhat redundant.

jfoug commented 9 years ago

Dyna formats may need to have scale looked at. BUT they should ALL get updated with a single change. right now, they are 6144 (I think) in scale (OMP-4). It may be 6144 fixed no matter what the OMP level. But I think it should be increased (some), or possibly recomputed for each type.

magnumripper commented 9 years ago

578fe51 fixed a whole lot of formats with this one-liner[tm]

$ git grep -El "keys_per_crypt = omp_t \* M(IN|AX)_KEYS_PER_CRYPT;" | xargs sed -ri 's/keys_per_crypt = omp_t \* M(IN|AX)_KEYS_PER_CRYPT;/keys_per_crypt *= omp_t;/'

jfoug commented 9 years ago

So we can use OMP_SCALE environment car now ?

magnumripper commented 9 years ago

I would guess so but haven't bothered. Go ahead if you like.

Note to self: md5crypt is odd - I tried unifying it but reverted. And scrypt does not benefit at all from scaling - it's just too slow.

jfoug commented 9 years ago

I doubt scrypt or bcrypt for that matter will have any benefit. Super slow ones just don't matter, and are better with scale 1

jfoug commented 9 years ago

I did some work on dyna. I was able to get things to have more or less values in OMP. Now comes the tough part. There are some formats that get slower (some much slower), when you increase count, and some that get faster. So, I may have to make some changes. For these changes I will need to:

find the max required (for any dyna)
allocate (we only allocate ONE time) the buffers based upon this max.
add data to formats that allow them to override (down) from this max OMP value).
make changes to some functions which simply use MAX_KEYS in for loops. They would have to use the 'max' value for the specific dyna format.

But all in all, I think I can get 150% for many formats (the faster ones), might get 200% for some that are over scaled today, and get 110-120% for most others. Some may have no change at all.

For example, here were some timings I saw

8x-OMP on my Core-i7 quad HT

raw-md5
   68000k
dyna_0
   44000k   (6144 scale)
   54000k   (6144*2 scale)
   56000k   (6144*3 scale)
   56000k   (6144*4 scale)  (this one fluxuated quite a bit)

So you can see for this format, we got about half of the loss back. There is no way we can get it all, there simply is WAY too much overhead in dyna that I simply can not eliminate.

But I do think I can get dyna working quite a bit quicker for many of the formats. It may not be a trivial undertaking, but I really think it needs done. At the same time, I really would love to simplify this somewhat, but I may not be able to do that. Being able to handle SIMD (of multiple flavors), oSSL, md5_go, md5_body and md5_body-x2 is NOT trivial, especially allowing switching in and out of SIMD/flat. The code prior to md5_body-x2 was quite a bit more simplistic, BUT that md5_body_x2 does provide a significant improvement in performance, so I think coding for it, even though it adds 100's of extra lines of #ifdef code, does make the format faster.

magnumripper commented 9 years ago

Is Dynamic all-or-nothing OMP? I usually build with --disable-openmp-for-fast-formats. I use --fork for the fast ones.

jfoug commented 9 years ago

I believe it is all or nothing. I may have to address the disable-omp-for-fast-formats switch.

jfoug commented 9 years ago

Ok, here is a 'checking' program (to test BE and LE GETPOS macros).

#include <stdio.h>
#include <string.h>

// GETPOS test.
//  This will work with generating a proper get-pos for SIMD_COEF_32 of 
//  2, 4, 6, 16 limbs, for LE or BE.

void dump4(int, unsigned char *);

// I did this without using #defines, to allow easier debugging in MSVC.
// These will be defines in 'real' code
int SIMD_COEF_32=2;
int SIMD_SHIFT;
int SHA_BUF_SIZ=2;
int MD5_BUF_SIZ=2;

#define GETPOS_BE32(i, idx) ((idx & (SIMD_COEF_32-1)) * 4 + ((i) & (0xffffffff - 3)) * SIMD_COEF_32 + (((i) & 3) ^ 3) + (idx >> SIMD_SHIFT) * SHA_BUF_SIZ * SIMD_COEF_32 * 4)
#define GETPOS_LE32(i, idx) ((idx & (SIMD_COEF_32-1)) * 4 + ((i) & (0xffffffff - 3)) * SIMD_COEF_32 +  ((i) & 3)      + (idx >> SIMD_SHIFT) * MD5_BUF_SIZ * SIMD_COEF_32 * 4)

int main() {
    unsigned char Buf[512];
    int i, idx;

    for (SIMD_COEF_32 = 2; SIMD_COEF_32 <= 16; SIMD_COEF_32 <<= 1) {
        if (SIMD_COEF_32==16)      SIMD_SHIFT=4;
        else if (SIMD_COEF_32==8)  SIMD_SHIFT=3;
        else if (SIMD_COEF_32==4)  SIMD_SHIFT=2;
        else if (SIMD_COEF_32==2)  SIMD_SHIFT=1;

        memset(Buf, 0, sizeof(Buf));
        for (idx = 0; idx < SIMD_COEF_32*2; ++idx) {
            for (i = 0; i < 4; ++i) {
                Buf[GETPOS_BE32(i,idx)]= (unsigned char)((i*16)+idx);
            }
        }
        dump4(1, Buf);
        memset(Buf, 0, sizeof(Buf));
        for (idx = 0; idx < SIMD_COEF_32*2; ++idx) {
            for (i = 0; i < 4; ++i) {
                Buf[GETPOS_LE32(i,idx)]= (unsigned char)((i*16)+idx);
            }
        }
        dump4(0, Buf);
        printf("\n");
    }
}

/* Dumps 4 limbs.  Does BE to LE conversion if in BE format. */
void dump4(int isBE, unsigned char *buf) {
    int i;
    printf ("%s_%d: ", isBE?"BE":"LE", SIMD_COEF_32);
    for (i = 0; i < SIMD_COEF_32*4*4; i += 4) {
        if (buf[i] == 0 && buf[i+1] == 0 && buf[i+2] == 0 && buf[i+3] == 0)
            printf ("0 ");
        else {
            if (isBE)
                printf ("%02x%02x%02x%02x ", buf[i+3], buf[i+2],buf[i+1],buf[i]);
            else
                printf ("%02x%02x%02x%02x ", buf[i], buf[i+1],buf[i+2],buf[i+3]);
        }
    }
    printf ("\n");
}

Here are some results. @magnum, can you please validate that the results for 8 and 16 coef are correct. I 'think' they are, but I am not 100% sure.

BE_2: 00102030 01112131 0 0 02122232 03132333 0 0
LE_2: 00102030 01112131 0 0 02122232 03132333 0 0

BE_4: 00102030 01112131 02122232 03132333 0 0 0 0 04142434 05152535 06162636 07172737 0 0 0 0
LE_4: 00102030 01112131 02122232 03132333 0 0 0 0 04142434 05152535 06162636 07172737 0 0 0 0

BE_8: 00102030 01112131 02122232 03132333 04142434 05152535 06162636 07172737 0 0 0 0 0 0 0 0 08182838 09192939 0a1a2a3a 0b1b2b3b 0c1c2c3c 0d1d2d3d 0e1e2e3e 0f1f2f3f 0 0 0 0 0 0 0 0
LE_8: 00102030 01112131 02122232 03132333 04142434 05152535 06162636 07172737 0 0 0 0 0 0 0 0 08182838 09192939 0a1a2a3a 0b1b2b3b 0c1c2c3c 0d1d2d3d 0e1e2e3e 0f1f2f3f 0 0 0 0 0 0 0 0

BE_16: 00102030 01112131 02122232 03132333 04142434 05152535 06162636 07172737 08182838 09192939 0a1a2a3a 0b1b2b3b 0c1c2c3c 0d1d2d3d 0e1e2e3e 0f1f2f3f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10203040 11213141 12223242 13233343 14243444 15253545 16263646 17273747 18283848 19293949 1a2a3a4a 1b2b3b4b 1c2c3c4c 1d2d3d4d 1e2e3e4e 1f2f3f4f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LE_16: 00102030 01112131 02122232 03132333 04142434 05152535 06162636 07172737 08182838 09192939 0a1a2a3a 0b1b2b3b 0c1c2c3c 0d1d2d3d 0e1e2e3e 0f1f2f3f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10203040 11213141 12223242 13233343 14243444 15253545 16263646 17273747 18283848 19293949 1a2a3a4a 1b2b3b4b 1c2c3c4c 1d2d3d4d 1e2e3e4e 1f2f3f4f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

jfoug commented 9 years ago

Here were the 'originals'

#define GETPOS(i, idx)     ((idx & (SIMD_COEF_32 - 1)) * 4 + \
                           ((i) & (0xffffffff - 3)) * SIMD_COEF_32 + \
                           (((i) & 3) ^ 3) + (idx >> (SIMD_COEF_32 >> 1)) * \
                           SHA_BUF_SIZ * SIMD_COEF_32 * 4)
#define GETPOS(i, idx)     ((idx & (SIMD_COEF_32 - 1)) * 4 + \
                           ((i) & (0xffffffff - 3)) * SIMD_COEF_32 + \
                           ((i) & 3)      + (idx >> (SIMD_COEF_32 >> 1)) * \
                           MD5_BUF_SIZ * SIMD_COEF_32 * 4)

Here are my modifications

#define GETPOS_BE32(i,idx) ((idx & (SIMD_COEF_32-1)) * 4 + \
                           ((i) & (0xffffffff - 3)) * SIMD_COEF_32 + \
                           (((i) & 3) ^ 3) + (idx >> SIMD_SHIFT) * \
                           SHA_BUF_SIZ * SIMD_COEF_32 * 4)
#define GETPOS_LE32(i,idx) ((idx & (SIMD_COEF_32-1)) * 4 + \
                           ((i) & (0xffffffff - 3)) * SIMD_COEF_32 + \
                           ((i) & 3)      + (idx >> SIMD_SHIFT) * \
                           MD5_BUF_SIZ * SIMD_COEF_32 * 4)

The only thing that was changed was the (SIMD_COEF_32>>1)`` becoming SIMD_SHIFT``` and SIMD_SHIFT will probably have to be SIMD32_SHIFT since we may have the same thing for SIMD64

magnumripper commented 9 years ago

I think they are OK, but I have been confused before :-)

The only thing that was changed was the (SIMD_COEF_32>>1)`becoming SIMD_SHIFT``

This should be correct.

jfoug commented 9 years ago

I am pretty sure dyna will be a real bitch when it comes to COEF > 4

I think for GSOC, we should probably just have them first create the SIMD code, and then simply create as small an update as possible for the raw format. So they should update:

rawMD4_fmt_plug.c rawMD5_fmt_plug.c rawSHA1_fmt_plug.c rawSHA256_fmt_plug.c rawSHA512_fmt_plug.c

at first. Get them solid, and with as 'minimal' as possible change so that COEF==4|8|16 all work. Once we get to that point, we can start to move out. I think the pbkdf2_*.h would probably be the 2nd thing(s) to do, as they are somewhat simple to change, and impact a bunch of formats. Then start doing one off's on the other 70 or so SIMD formats, and also start on dyna.

Yes, getting some of these things like we are in this thread done up front is good. I really think we could get the raw formats 'ready', just waiting for the code. But on a lot of it, we simply will have to put out fires.

openwall / john

OMP scaling #877