openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.01k stars 2.07k forks source link

[Feature Request] Update INSTALL.md for AMD ROCm for Linux builds #5522

Open mostwanted002 opened 1 month ago

mostwanted002 commented 1 month ago
OS: Ubuntu 24.04 LTS
Kernel: 6.8.0-40-generic
GPU: AMD Radeon RX 6900 XT, AMD Radeon Instinct MI 50
ROCk module version: 6.8.5
ROCm: 6.2.0

With the latest ROCm installation, AMD has restructured OpenCL headers and libraries.

Just specifying 2 flags in ./configure command can fix that.

It'll be great if we can add a small section in INSTALL.md for AMD GPU build:

./configure LDFLAGS=-L/opt/rocm/lib CPPFLAGS=-I/opt/rocm/include

With default ./configure:


# john-the-ripper.opencl --list=opencl-devices
Error: No OpenCL-capable platforms were detected by the installed OpenCL driver.
Error: No OpenCL-capable devices were detected by the installed OpenCL driver.

With the updated configure:

# ./john --list=opencl-devices
Platform #0 name: AMD Accelerated Parallel Processing, version: OpenCL 2.1 AMD-APP (3625.0)
    Device #0 (1) name:     gfx1030
    Board name:             AMD Radeon RX 6900 XT
    Device vendor:          Advanced Micro Devices, Inc.
    Device type:            GPU (LE)
    Device version:         OpenCL 2.0
    OpenCL version support: OpenCL C 2.0
    Driver version:         3625.0 (HSA1.1,LC) - AMDGPU-Pro
    Native vector widths:   char 4, short 2, int 1, long 1
    Preferred vector width: char 4, short 2, int 1, long 1
    Global Memory:          16368 MiB
    Global Memory Cache:    16 KiB
    Local Memory:           64 KiB (Local)
    Constant Buffer size:   13912 MiB
    Max memory alloc. size: 13912 MiB
    Max clock (MHz):        2660
    Profiling timer res.:   1 ns
    Max Work Group Size:    256
    Parallel compute cores: 40
    Stream processors:      2560  (40 x 64)
    Speed index:            6809600
    SIMD width:             32
    Wavefront width:        32
    PCI device topology:    83:00.0

    Device #1 (2) name:     gfx906:sramecc+:xnack-
    Board name:             AMD Radeon (TM) Pro VII
    Device vendor:          Advanced Micro Devices, Inc.
    Device type:            GPU (LE)
    Device version:         OpenCL 2.0
    OpenCL version support: OpenCL C 2.0
    Driver version:         3625.0 (HSA1.1,LC) - AMDGPU-Pro
    Native vector widths:   char 4, short 2, int 1, long 1
    Preferred vector width: char 4, short 2, int 1, long 1
    Global Memory:          16368 MiB
    Global Memory Cache:    16 KiB
    Local Memory:           64 KiB (Local)
    Constant Buffer size:   13912 MiB
    Max memory alloc. size: 13912 MiB
    Max clock (MHz):        1700
    Profiling timer res.:   1 ns
    Max Work Group Size:    256
    Parallel compute cores: 60
    Stream processors:      3840  (60 x 64)
    Speed index:            6528000
    SIMD width:             16
    Wavefront width:        64
    PCI device topology:    c3:00.0
solardiz commented 1 month ago

Thank you for reporting this!

It'll be great if we can add a small section in INSTALL.md for AMD GPU build:

We don't currently have an INSTALL.md. We have several doc/INSTALL* files and doc/README-OPENCL. Also, OpenCL and AMD-specific instructions are found in doc/INSTALL-UBUNTU.

I'm not up to date on this, but I guess not everyone with an AMD card will use their ROCm driver now - they still also have their "Pro" driver, correct? So an instructions update should cover both cases.

Can you suggest a specific edit?

mostwanted002 commented 1 month ago

Noted.

I'll work on it and provide a detailed list of edits required. I think it'll be easier to work on a PR hence will do that.

solardiz commented 1 month ago

Looks like you never said what version(s) of John you observed the problem and tested the fix with?

./configure LDFLAGS=-L/opt/rocm/lib CPPFLAGS=-I/opt/rocm/include

CPPFLAGS=-I/opt/rocm/include shouldn't make a difference for code currently in this repo, because now we bundle the OpenCL headers. LDFLAGS=-L/opt/rocm/lib may still make a difference, but if we implement #5397 then we could want to start recommending --enable-opencl instead.

In other words, we're now (almost) able to build with OpenCL support without actually finding any OpenCL at build time. The only reasons we don't do this by default are to save some build time and avoid the risk of some build issues on systems where OpenCL support wouldn't be needed.

solardiz commented 1 month ago

Also, if we find any OpenCL library at build time, then the resulting build should work with whatever (maybe other) OpenCL library it sees at run time. So rather than suggest e.g. ROCm-specific library path, we could suggest installing generic OpenCL library from the distro's packages.

In fact, we already make this sort of suggestion in doc/INSTALL-UBUNTU:

==== If you have AMD GPU(s) (OpenCL support)

    sudo apt-get -y install ocl-icd-opencl-dev opencl-headers

    The above should be sufficient to build JtR with OpenCL support, but to
    actually use it you also need [...]

Did this not work for you? What operating system are you on?

mostwanted002 commented 1 month ago

Apologies. I completely overlooked those UBUNTU-specific instructions.

I did try them now and it worked as expected.

However, for Vega GPUs and later, AMD recommends using ROCm installation for compute applications.

They provide the GPU driver in their newer method of GPU driver installation, which is using amdgpu-install meta script that they've provided.

This includes both, HIP and OpenCL. It seems a little redundant to install OpenCL headers separately when they are included with ROCm installation itself.

solardiz commented 1 month ago

for Vega GPUs and later, AMD recommends using ROCm installation for compute applications.

The question is which driver actually performs better with our kernels. While we're able to get 100% working with NVIDIA drivers (with occasional exceptions), for AMD there are always some failing (usually miscompiles, resulting in failed self-tests). So if one AMD driver has e.g. 5 JtR formats failing and another 15, we'd "recommend" the one with 5. You could want to run ./john --test --format=opencl and see if it runs to completion and how many work vs. fail. Maybe ROCm is fine now, I just don't know.

It seems a little redundant to install OpenCL headers separately when they are included with ROCm installation itself.

OpenCL headers installed on the system (in whatever way) aren't required/used by current code in this repo at all! This is stale documentation, we may want to update it to drop that. OTOH, I suspect that some people building the older 1.9.0-jumbo-1 release also end up looking at the documentation in here, and for them such update would prevent the builds from working.

mostwanted002 commented 1 month ago

Interesting findings on further testing:

  1. John compiled with either of the two fails on 6 OpenCL algorithms:
# Used a custom script to test each algo one by one instead of `--test --format=opencl` because some error was resulting in a SEGFAULT

krb5pa-md5-opencl: FAIL
krb5tgs-opencl: FAIL
LM-opencl: FAIL
pgpdisk-opencl: FAIL
pgpsda-opencl: FAIL
SL3-opencl: FAIL

john_opencl_perf_benchmark.txt john_rocm_perf_benchmark.txt

Looking into them might be another investigation altogether.

  1. Performance between the two is identical.

  2. I retested the ./configure command after removing ocl-icd-opencl-dev and opencl-headers packages, and it reported no OpenCL support.

# ./configure | grep -i opencl
checking additional paths for OpenCL... none
checking for OpenCL library... no
OpenCL support ..................................... no
solardiz commented 1 month ago

Thank you very much for running these tests.

some error was resulting in a SEGFAULT

Can you show this (some copy-paste from the terminal from just before the segfault)?

krb5pa-md5-opencl: FAIL krb5tgs-opencl: FAIL LM-opencl: FAIL pgpdisk-opencl: FAIL pgpsda-opencl: FAIL SL3-opencl: FAIL

Can you provide more detail on these? We normally log the function name/parameter where the failure was observed.

Performance between the two is identical.

This and the same set of failing formats suggests these drivers are probably the same under the hood now.

I retested the ./configure command after removing ocl-icd-opencl-dev and opencl-headers packages, and it reported no OpenCL support.

We shouldn't need opencl-headers, but we probably still do need ocl-icd-opencl-dev until #5397 is implemented.

claudioandre-br commented 1 month ago

In case of AMD, the system [1] uses some environment variables [2] to detect where things are installed. It seems that your system uses new names or something [3].

[1] the OpenCL automatic "configure" detection system. [2] $AMDAPPSDKROOT or $ATISTREAMSDKROOT. [3] env | grep /opt/rocm might help to find a new name, if any.

mostwanted002 commented 1 month ago

@solardiz here is the output from failed algorithms:

# krb5pa-md5-opencl
Device 1: gfx1030 [AMD Radeon RX 6900 XT]
Benchmarking: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4 HMAC-MD5 RC4 OpenCL/mask accel]... FAILED (crypt_all(1) zero return)

# krb5tgs-opencl
Device 1: gfx1030 [AMD Radeon RX 6900 XT]
Benchmarking: krb5tgs-opencl, Kerberos 5 TGS-REP etype 23 [MD4 HMAC-MD5 RC4 OpenCL/mask accel]... Memory access fault by GPU node-1 (Agent handle: 0x6189c6131100) on address 0x74b8060bd000. Reason: Page not present or supervisor privilege.

# LM-opencl
Device 1: gfx1030 [AMD Radeon RX 6900 XT]
Benchmarking: LM-opencl [DES BS OpenCL/mask accel]... Options used: -I opencl -cl-mad-enable -cl-std=CL1.2 -D__GPU__ -DDEVICE_INFO=522 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=3625 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -D FULL_UNROLL=1 -D USE_LOCAL_MEM=1 -D WORK_GROUP_SIZE=8 -D OFFSET_TABLE_SIZE=13 -D HASH_TABLE_SIZE=9 -D MASK_ENABLE=0 -D ITER_COUNT=1 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=14588628168 -D SELECT_CMP_STEPS=2 -D BITMAP_MASK=0x3ffffU -D REQ_BITMAP_BITS=18 /home/mostwanted002/Downloads/john-rocm/run/opencl/lm_kernel_f.cl
Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.
ld.lld: error: undefined hidden symbol: lm_loop
>>> referenced by /tmp/comgr-3c4d52/input/linked.bc.o:(lm_bs_f)
>>> referenced by /tmp/comgr-3c4d52/input/linked.bc.o:(lm_bs_f)
Error: Creating the executable from LLVM IRs failed.
Error building kernel /home/mostwanted002/Downloads/john-rocm/run/opencl/lm_kernel_f.cl. DEVICE_INFO=522
0: OpenCL CL_BUILD_PROGRAM_FAILURE (-11) error in opencl_common.c:1293 - clBuildProgram

# pgpdisk-opencl
Device 1: gfx1030 [AMD Radeon RX 6900 XT]
Benchmarking: pgpdisk-opencl, PGP Disk / Virtual Disk [SHA1 AES/TwoFish/CAST OpenCL]... segmentation fault (core dumped)

# pgpsda-opencl
Device 1: gfx1030 [AMD Radeon RX 6900 XT]
Benchmarking: pgpsda-opencl, PGP Self Decrypting Archive [SHA1 CAST OpenCL]...  segmentation fault (core dumped)

# SL3-opencl
Device 1: gfx1030 [AMD Radeon RX 6900 XT]
Benchmarking: SL3-opencl, Nokia operator unlock [SHA1 OpenCL/mask accel]...  segmentation fault (core dumped)

I'll provide a strace output soon as well.

@claudioandre-br

no new env variables were present.

claudioandre-br commented 1 month ago
From 60eb23226ef2796207eed15c86ff12d8bc8ee992 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <dev@claudioandre.slmail.me>
Date: Wed, 14 Aug 2024 13:20:45 -0300
Subject: [PATCH] OpenCL: add ROCm installation detection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The OpenCL detection mechanisms used by `configure` do not work properly
for the ROCm driver, this patch probably fixes the problem (untested).

Signed-off-by: Claudio André <dev@claudioandre.slmail.me>
---
 src/m4/jtr_utility_macros.m4 | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/m4/jtr_utility_macros.m4 b/src/m4/jtr_utility_macros.m4
index 6b5c58e2b..edea65215 100644
--- a/src/m4/jtr_utility_macros.m4
+++ b/src/m4/jtr_utility_macros.m4
@@ -213,6 +213,14 @@ AC_DEFUN([JTR_SET_OPENCL_INCLUDES],
    AC_MSG_CHECKING([additional paths for OpenCL])
    ADD_LDFLAGS=""
    ADD_CFLAGS=""
+   ROCM_INSTALLED="/opt/rocm"
+   if test -d "$ROCM_INSTALLED"; then
+      if test $CPU_BIT_STR = 64 -a -d "$ROCM_INSTALLED/lib/x86_64" ; then
+         ADD_LDFLAGS="$ADD_LDFLAGS -L$ROCM_INSTALLED/lib/x86_64"
+      elif test -d "$ROCM_INSTALLED/lib"; then
+         ADD_LDFLAGS="$ADD_LDFLAGS -L$ROCM_INSTALLED/lib"
+      fi
+   fi
    if test -n "$AMDAPPSDKROOT"; then
       if test -d "$AMDAPPSDKROOT/include"; then
          ADD_CFLAGS="$ADD_CFLAGS -I$AMDAPPSDKROOT/include"
-- 
2.43.0
solardiz commented 1 month ago

Thank you @claudioandre-br!

+         ADD_LDFLAGS="$ADD_LDFLAGS -L$ROCM_INSTALLED/lib/x86_64"

I'm pretty sure @mostwanted002 is on 64-bit, yet didn't need to specify the /x86_64 part to get our configure to detect the library. So I guess this if/else is unneeded and possibly harmful.