openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.3k stars 2.1k forks source link

Doesn't find OpenMP on macOS (though libomp is installed) #5511

Open forthrin opened 4 months ago

forthrin commented 4 months ago

Using OpenCL formats is much faster than its CPU counterparts, but they seem to have something of a one minute startup penalty. (Accordingly, when running an actual crack, nothing happens until after about one minute.) Is this:

  1. Hardware limitation
  2. Necessity / By design
  3. Bug / Known issue
  4. User error
$ time ./john --format=dmg --test
Benchmarking: dmg, Apple DMG [PBKDF2-SHA1 128/128 ASIMD 4x 3DES/AES]... DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1
Raw:    4312 c/s real, 4334 c/s virtual
0m2.410s # FAST

$ time ./john --format=dmg-opencl --test
Device 1: Apple M1
Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 3DES/AES OpenCL]... LWS=64 GWS=524288 (8192 blocks) DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1
Raw:    160824 c/s real, 34952K c/s virtual
1m4.962s # SLOW

# System Version: macOS 14.5 (23F79)
# Model Identifier: MacBookAir10,1

$ ./john --list=build-info
Version: 1.9.0-jumbo-1+bleeding-19d731b1ca 2024-07-07 17:21:12 +0200
Build: darwin23.5.0 64-bit arm ASIMD AC OPENCL
SIMD: ASIMD, interleaving: MD4:2 MD5:2 SHA1:1 SHA256:1 SHA512:1
$JOHN is ./
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 2147483648
SINGLE_BUF_MAX: 4294967295
Effective limit: Number of salts vs. SingleMaxBufferSize
Max. Markov mode level: 400
Max. Markov mode password length: 30
clang version: 15.0.0 (clang-1500.3.9.4) (gcc 4.2.1 compatibility)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 030300010
OpenSSL 3.3.1 4 Jun 2024
GMP library version: 6.3.0
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 100
Using times(2) for timers, resolution 10 ms
HR timer: mach_absolute_time(), latency 42 ns
Total physical host memory: 8 GiB
Available physical host memory: 3118 MiB
Terminal locale string: UTF-8
Parsed terminal locale: UTF-8
solardiz commented 4 months ago

There are several (non-)issues here.

  1. Yes, the slow start of OpenCL formats is by design. There are several reasons for it: building the kernel from source the first time you run it (then a cached build is normally reused), auto-tuning the LWS and GWS settings (this is usually where most time is spent), and running self-test at a rather high GWS (which is so high because of the higher concurrency needed to use an OpenCL device optimally). You can see what it's doing during this time by increasing verbosity, e.g. with -v=5. You can also force usage of previously tuned LWS/GWS to skip the auto-tuning, e.g. by LWS=64 GWS=524288 john ... (setting those as environment variables via the Unix shell; note that it's easy copy-paste of what had been output before) or by using the --lws and --gws command-line options. You can also skip the self-test with --skip-self-test, but we generally recommend against skipping it.
  2. Your build appears to lack OpenMP support - why is that? So your CPU run is single-core. Perhaps with OpenMP it'd show much higher speed, but would also take somewhat longer to start (since auto-tuning and self-test at higher concurrency). Did you explicitly disable OpenMP at build time, or was it somehow not auto-detected in this build with clang for macOS? Could be something for us to fix.
  3. Also the default benchmark being for a mix of two different cost settings is something we could want to fix, standardizing on one cost.
forthrin commented 4 months ago

Thanks for reaching out!

  1. Consecutive plain john --format=dmg-opencl --test runs are consistently slow (eg. nothing seems automatically cached/reused between runs, if that was supposed to happen). Explicitly setting LWS and GWS finished the test in 12s.

  2. Build was the plain recommended ./configure && make -s clean && make -sj4. Do tell how to provide the necessary information to pinpoint why OpenMP is not enabled by default as expected.

checking for gcc option to support OpenMP... unsupported
AES-NI support ..................................... no
Cross compiling .................................... no
OpenMP support ..................................... no
librexgen (regex mode, see doc/README.librexgen) ... no
ZTEX USB-FPGA module 1.15y support ................. no

$ which gcc
/usr/bin/gcc
$ gcc --version
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

PS! Adding a stopwatch timestamp at the beginning of each debug line would make it easier to see/share performance!

solardiz commented 4 months ago

Consecutive plain john --format=dmg-opencl --test runs are consistently slow (eg. nothing seems automatically cached/reused between runs, if that was supposed to happen).

That's wrong conclusion - like I wrote, only 1 out of 3 things is supposed to be cached, so the runs would remain slow overall, yet some caching would occur. That's by design.

Since the auto-tuning is usually the slowest part, no wonder setting explicit LWS/GWS removes most of the delay for you.

Do tell how to provide the necessary information to pinpoint why OpenMP is not enabled by default as expected.

OK, apparently Apple still provides only crippled clang in Xcode. This has some answers: https://stackoverflow.com/questions/71061894/how-to-install-openmp-on-mac-m1

We also have binary builds in https://github.com/openwall/john-packages/releases - I think these are with OpenMP.

I don't use macOS myself.

forthrin commented 4 months ago
$ ./omptest # works even with /usr/bin/gcc
Hello World... from thread = 0
<snip>
Hello World... from thread = 7
$ brew info libomp
==> libomp: stable 18.1.8 (bottled) [keg-only]
$ echo $LDFLAGS
-L/opt/homebrew/opt/libomp/lib
$ echo $CPPFLAGS
-I/opt/homebrew/opt/libomp/include
$ brew info gcc
==> gcc: stable 14.1.0 (bottled), HEAD
$ echo $PATH
/opt/homebrew/Cellar/gcc/14.1.0_1/bin:<snip> # no "gcc", only "gcc-14" ++
$ ./configure
OpenMP support ..................................... still no

@solardiz: If you're on another OS and out of ideas, hope someone on macOS can chip in, for optimal performance.

solardiz commented 4 months ago

$ echo $PATH /opt/homebrew/Cellar/gcc/14.1.0_1/bin: # no "gcc", only "gcc-14" ++

As some people on that StackOverflow wrote, you may need to use e.g. gcc-14 explicitly. Try e.g. ./configure CC=gcc-14. Alternatively, apparently it should also be possible to use libomp coming from Brew along with clang.

solardiz commented 4 months ago

@claudioandre-br Maybe you want to comment on how we're doing our CI builds for macOS - with OpenMP, right?

forthrin commented 4 months ago

CC=gcc-14 got somewhere! How to get a true "yes", or is "yes, but" as good as it gets?

OpenMP support ..................................... yes (not for fast formats)

solardiz commented 4 months ago

You can force it to a plain yes with --enable-openmp-for-fast-formats, but we recommend that you don't do that.

claudioandre-br commented 4 months ago

@claudioandre-br Maybe you want to comment on how we're doing our CI builds for macOS - with OpenMP, right?

Yes, OpenMP on"default" clang. The rolling-2404 is:

Version: 1.9.0-jumbo-1+bleeding-f9fedd238b 2024-04-01 13:35:37 +0200
Build: darwin23.5.0 64-bit arm ASIMD AC OMP OPENCL
SIMD: ASIMD, interleaving: MD4:2 MD5:2 SHA1:1 SHA256:1 SHA512:1
OMP fallback binary: john-arm64
$JOHN is ../run/
[...]
Max. Markov mode password length: 30
clang version: 15.0.0 (clang-1500.3.9.4) (gcc 4.2.1 compatibility)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 030300010
OpenSSL 3.3.1 4 Jun 2024
GMP library version: 6.3.0
[...]

Build on:

ProductVersion:     14.5
BuildVersion:       23F79
Homebrew 4.3.9
Xcode_15.4

doc/INSTALL recommends gcc (passing the CC value for .configure). That's what magnum used to use.

solardiz commented 4 months ago

@claudioandre-br In other words, you're saying we get OpenMP with the exact same clang version that @forthrin also has, but does not get OpenMP with? Are we perhaps installing something from Homebrew to get this to work?

BTW, we have doc/INSTALL* files for several systems, but not for macOS. Maybe we should add one.

forthrin commented 4 months ago

@claudioandre-br: Ah! I see this is actually thoroughly described at the bottom of DOC/INSTALL. Should have read the whole thing. Maybe configure should say:

OpenMP ... no (See DOC/INSTALL "Optimal build on macOS") (Actually still says "OS X")

Happy to say john managed to crack a forgotten password which saved a ton of work!

Closing up:

  1. Why is --enable-openmp-for-fast-formats discouraged? (Which are the "fast formats"?)
  2. Can John make word lists such as:

(0-2 exclamation marks) (random word from list) (0 or 1 spaces) (number 0-9999) (0 or 1 spaces) (random word from list) (0-2 exclamation marks)

... or is it generally recommended to make such with a scripting language in advance? (Had a look at the mask and rules docs, but couldn't really figure out eg. combining two words.)

claudioandre-br commented 4 months ago

In other words, you're saying we get OpenMP with the exact same clang version that @forthrin also has, but does not get OpenMP with?

Yes. See the -Xclang -fopenmp at https://github.com/openwall/john-packages/blob/main/scripts/ci_controller.sh#L97.

There is some related magic and Google-sama might have an explanation on the matter. By the way: the order of the arguments is important.

solardiz commented 4 months ago

See the -Xclang -fopenmp at https://github.com/openwall/john-packages/blob/main/scripts/ci_controller.sh#L97.

Should we possibly get this into the main project here, perhaps as a change to configure.ac? Would you test and contribute?

When linking to specific line numbers, let's also link to specific revisions - GitHub substitutes it when you press y. Anyway, I'll quote the relevant excerpt below this time.

    if [[ "$TARGET_ARCH" == *"macOS"* ]]; then
        SYSTEM_WIDE=''
        REGULAR="$SYSTEM_WIDE $ASAN $BUILD_OPTS"
        NO_OPENMP="--disable-openmp $SYSTEM_WIDE $ASAN $BUILD_OPTS"

        #Libraries and Includes
        if [[ "$TARGET_ARCH" == *"macOS X86"* ]]; then
            MAC_LOCAL_PATH="usr/local/opt"
        else
            MAC_LOCAL_PATH="opt/homebrew/opt"
        fi
        LDFLAGS_ssl="-L/$MAC_LOCAL_PATH/openssl/lib"
        LDFLAGS_gmp="-L/$MAC_LOCAL_PATH/gmp/lib"
        LDFLAGS_omp="-L/$MAC_LOCAL_PATH/libomp/lib -lomp"

        CFLAGS_ssl="-I/$MAC_LOCAL_PATH/openssl/include"
        CFLAGS_gmp="-I/$MAC_LOCAL_PATH/gmp/include"
        CFLAGS_omp="-I/$MAC_LOCAL_PATH/libomp/include"

        if [[ $TARGET_ARCH == *"macOS ARM"* ]]; then
            brew link openssl --force
        fi

        if [[ $TARGET_ARCH == *"macOS X86"* ]]; then
            do_configure "$NO_OPENMP" --enable-simd=avx && do_build ../run/john-avx
            do_configure "$REGULAR" --enable-simd=avx LDFLAGS="$LDFLAGS_omp" CPPFLAGS="-Xclang -fopenmp $CFLAGS_omp -DOMP_FALLBACK_BINARY=\"\\\"john-avx\\\"\" " && do_build ../run/john-avx-omp
            do_configure "$NO_OPENMP" --enable-simd=avx2 && do_build ../run/john-avx2
            do_configure "$REGULAR" --enable-simd=avx2 LDFLAGS="$LDFLAGS_omp" CPPFLAGS="-Xclang -fopenmp $CFLAGS_omp -DOMP_FALLBACK_BINARY=\"\\\"john-avx2\\\"\" -DCPU_FALLBACK_BINARY=\"\\\"john-avx-omp\\\"\" " && do_build ../run/john-avx2-omp
            BINARY="john-avx2-omp"
        else
            do_configure "$NO_OPENMP" LDFLAGS="$LDFLAGS_ssl $LDFLAGS_gmp" CPPFLAGS="$CFLAGS_ssl $CFLAGS_gmp" && do_build "../run/john-$arch"
            do_configure "$REGULAR" LDFLAGS="$LDFLAGS_ssl $LDFLAGS_gmp $LDFLAGS_omp" CPPFLAGS="-Xclang -fopenmp $CFLAGS_ssl $CFLAGS_gmp $CFLAGS_omp -DOMP_FALLBACK_BINARY=\"\\\"john-$arch\\\"\" " && do_build ../run/john-omp
            BINARY="john-omp"
        fi
        do_release "No" "Yes" $BINARY # --system-wide, --support-opencl, --binary-name
    fi
solardiz commented 4 months ago

the default benchmark being for a mix of two different cost settings is something we could want to fix, standardizing on one cost.

Maybe what we have now isn't that bad - it's same iteration count. I don't recall what effect on speeds the version has for the same iteration count. Looking at the test vectors we currently have, there aren't any two that look the same anyway - they all have data of different lengths.

Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1
solardiz commented 4 months ago
1. Why is `--enable-openmp-for-fast-formats` discouraged? (Which are the "fast formats"?)

Those are implementations of hashes that are too fast for efficient scaling with our current use of OpenMP. We recommend usage of --fork on them instead of OpenMP.

2. Can John make word lists such as:

(0-2 exclamation marks) (random word from list) (0 or 1 spaces) (number 0-9999) (0 or 1 spaces) (random word from list) (0-2 exclamation marks)

... or is it generally recommended to make such with a scripting language in advance?

This varies. I'd normally use John itself, but for complicated cases such as yours it may take non-trivial configuration or/and multiple invocations of it. Here's a start, but then you need to write rule sets and likely use two invocations, piping intermediate list between them:

./john -w=w --external=combinator --rules-stack=': /[ ] Ap" [0-9][0-9][0-9][0-9]"' --stdout

I'd have greater motivation to spend time on a more complete answer if we were discussing this on the john-users list, but here it's just not worth my time.

forthrin commented 4 months ago

@solardiz: Good you're focusing your valuable time. Happy that OP segued into worthwhile compilation issues. Ping me for testing. Thanks for your help and keep up the great work!

PS! If the mailing list was moved to "Discussions" at some point that would be convenient!