openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.03k stars 2.07k forks source link

Trying to build with rexgen enabled #4640

Open frank-dittrich opened 3 years ago

frank-dittrich commented 3 years ago

Latest rexgen doesn't even compile anymore (on Fedora 32), and it looks like https://github.com/janstarke/rexgen hasn't been maintained for almost 2 years.

To be able to compile latest rexgen, I applied this change:

diff --git a/src/librexgen/iterator/iterator.h b/src/librexgen/iterator/iterator.h
index 5b99c77..79925f2 100644
--- a/src/librexgen/iterator/iterator.h
+++ b/src/librexgen/iterator/iterator.h
@@ -29,6 +29,7 @@
 #include <memory>

 #ifdef __cplusplus
+#include <stdexcept>
 namespace rexgen {
   class IteratorState;

Apparently, applying https://github.com/janstarke/rexgen/pull/64 on top of https://github.com/janstarke/rexgen/commit/585e86da6a07a63ec7a871320b307457661d557c also helps.

$ rexgen -v
rexgen-2.1.3

The c_regex_cb_mb interface has been changed, and I had to make 2 warnings disappear.

diff --git a/src/regex.c b/src/regex.c
index cd7642a5d..bd9486167 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -44,8 +44,8 @@ char *stpcpy(char *dst, const char *src) {
 #endif

 char *rexgen_alphabets[256];
-static c_iterator_ptr iter = NULL;
-static c_regex_ptr regex_ptr = NULL;
+static c_iterator_ptr iter = 0;
+static c_regex_ptr regex_ptr = 0;
 static char *save_str;
 static const char *cur_regex, *save_regex;
 static char *restore_str, *restore_regex;
@@ -184,6 +184,10 @@ void SetupAlpha(const char *regex_alpha)
    }
 }

+void parser_error(const char* msg) {
+   fprintf(stderr, "%s\n", msg);
+}
+
 int do_regex_hybrid_crack(struct db_main *db, const char *regex,
                           const char *base_word, int regex_case,
                           const char *regex_alpha)
@@ -218,7 +222,7 @@ int do_regex_hybrid_crack(struct db_main *db, const char *regex,
        rec_init_hybrid(save_state_hybrid);
        crk_set_hybrid_fix_state_func_ptr(rex_hybrid_fix_state);

-       regex_ptr = c_regex_cb_mb(regex, callback);
+       regex_ptr = c_regex_cb_mb(regex, callback, parser_error);
        if (!regex_ptr) {
            c_simplestring_delete(buffer);
            fprintf(stderr,
@@ -333,7 +337,7 @@ void do_regex_crack(struct db_main *db, const char *regex)
    crk_init(db, fix_state, NULL);
    rec_init_hybrid(save_state_hybrid);

-   regex_ptr = c_regex_cb_mb(regex, callback);
+   regex_ptr = c_regex_cb_mb(regex, callback, parser_error);
    if (!regex_ptr) {
        fprintf(stderr,
                "Error, invalid regex expression.  John exiting now\n");

The c_regex_cb_mb interface has been changed somewhere in the middle of either the 2.0.8 or 2.0.9 release. So we also might need to adjust these 2 lines, changing 0x020006 to 0x020103 (or may be 0x020009):

src/configure:13066:int main() { return ! (rexgen_version_int() >= 0x020006); }
src/configure.ac:662:      [AC_MSG_CHECKING([librexgen minimum version])] && [AC_TRY_RUN([int main() { return ! (rexgen_version_int() >= 0x020006); }],

and doc/README.librexgen also still mentions

As of this writing the last known good version is 2.0.8 so you may want
to check that out.

Unfortunately, due to these /usr/local/include/librexgen/c/ApiContext.h lines

const c_regex_ptr c_regex_none = 0;
const c_iterator_ptr c_iterator_none = 0;

build fails like this:

/usr/bin/ld: inc.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: inc.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: john.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: john.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: options.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: options.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: recovery.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: recovery.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: wordlist.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: wordlist.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: mkv.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: mkv.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: listconf.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: listconf.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: regex.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: regex.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
/usr/bin/ld: pp.o:/usr/local/include/librexgen/c/ApiContext.h:27: multiple definition of `c_iterator_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:27: first defined here
/usr/bin/ld: pp.o:/usr/local/include/librexgen/c/ApiContext.h:26: multiple definition of `c_regex_none'; external.o:/usr/local/include/librexgen/c/ApiContext.h:26: first defined here
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:1450: ../run/john] Error 1
make[1]: Leaving directory '/home/fd/git/john/src'
make: *** [Makefile:190: default] Error 2

I have to give up now, did already spend more time on this than I intended to, and I don't know when I'll have time to continue.

Too bad that rexgen isn't maintained any longer. For certain tasks I loved to use john's --regex mode.

frank-dittrich commented 3 years ago

This helped,

diff --git a/src/librexgen/c/ApiContext.h b/src/librexgen/c/ApiContext.h
index 9885c7e..4c943e7 100644
--- a/src/librexgen/c/ApiContext.h
+++ b/src/librexgen/c/ApiContext.h
@@ -23,8 +23,8 @@
 typedef int c_regex_ptr;
 typedef int c_iterator_ptr;

-const c_regex_ptr c_regex_none = 0;
-const c_iterator_ptr c_iterator_none = 0;
+static const c_regex_ptr c_regex_none = 0;
+static const c_iterator_ptr c_iterator_none = 0;
$ ./john --stdout --regex='test[12][ab]x?'
test1a
test2a
test1b
test2b
test1ax
test2ax
test1bx
test2bx
solardiz commented 3 years ago

@frank-dittrich What do you suggest we do about the regex mode for our next release? I think the options include: drop it, add documentation on known issues and how to overcome them, include a patch against rexgen and documentation on applying it, bundle a version of rexgen in jumbo (in which case we'd end up maintaining it ourselves, but could as well enable it by default), or leave everything as-is.

frank-dittrich commented 3 years ago

I suggest to still support it for now, but warn in doc/README.librexgen and configure, that rexgen still contains some serious bugs and that support might have to be dropped in future releases, because rexgen appears to be not longer maintained.

I'll try to come up with appropriate wording, and detailed build instructions.

While in some areas there are improvements

$ ./john --stdout --regex='abc]'
syntax error, unexpected T_END_CLASS, expecting $end
Error, invalid regex expression.  John exiting now
$ ./john --stdout --regex='a[bc'
syntax error, unexpected $end, expecting T_END_CLASS
Error, invalid regex expression.  John exiting now

There are still some serious bugs.

$ ./john --stdout --regex='a[]'
Segmentation fault (core dumped)
frank-dittrich commented 3 years ago

for

$ rexgen 'a[]'

I found a solution which at least avoids the segfault, but rejecting [] as invalid would probably be better than my current 'fix':

$ rexgen 'a[]'
a

My clumsy "fix" breaks output for rexgen '(ab[cde])\1' see https://github.com/janstarke/rexgen/issues/65#issuecomment-803439267 (Rather than trying to maintain rexgen, I might try to reject some illegal regular expresions in john's regex.c.)

Another known segfault:

$ rexgen 'a\2'
frank-dittrich commented 3 years ago

Now that I managed to build john with rexgen support, I noticed that combining --regex with other modes doesn't work.

$ ./john --wordlist --rules=':lQ' --stdout --regex=alpha:leet
Using default input encoding: UTF-8
syntax error, unexpected $end
Error, invalid regex expression.  John exiting now  base_word=anthony  Regex= 
$ ./john --wordlist --rules=':lQ' --stdout --regex='abc\0xyz' 
Using default input encoding: UTF-8
Segmentation fault (core dumped)

I need to dig into it to find out whether this has been caused by changes in librexgen or john (or both).

frank-dittrich commented 3 years ago

Due to the problems I discovered with trying to make john --regex work with latest librexgen, I closed https://github.com/openwall/john/pull/4642

Based on https://github.com/janstarke/rexgen/commit/5b2f4b159ec948c1f9429eca4389ca2adc9c0b07 (which is the latest commit which allows building librexgen without problems) I prepared a new patch.

Now, all the segfault I got with rexgen 2.1.3 are gone, several things that didn't work with 2.1.3 now work without problems.

No more problems with non-ascii characters (produced a segfault with 2.1.3):

$ ./john --stdout --regex='[äöü]'
ä
ö
ü

No segfaults:

$ ./john --stdout --regex='a\1'
This regular expression has an invalid back reference
Error, invalid regex expression.  John exiting now
$ ./john --stdout --regex='[]'
Error, invalid regex expression.  John exiting now
$ ./john --stdout --regex='[~- ]'|wc -l
Press 'q' or Ctrl-C to abort, almost any other key for status
95p 0:00:00:00 0.00% 1900p/s  
95

In some cases, there isn't any error message providing details about illegal regular expressions, but IMHO that's preferable to a librexgen version causing segfaults:

$ ./john --stdout --regex='a]'
Error, invalid regex expression.  John exiting now
$ ./john --stdout --regex='[a'
Error, invalid regex expression.  John exiting now

Usage of '\0' in combination with --wordlist, --prince, --incemental also works:

$ ./john --stdout --wordlist=password.lst --regex='X\0[äöü]' | head -n 6
Using default input encoding: UTF-8
Press 'q' or Ctrl-C to abort, almost any other key for status
X123456ä
X123456ö
X123456ü
X12345ä
X12345ö
X12345ü
$ ./john --stdout --prince --regex='X\0[äöü]' | head -n 4
Press 'q' or Ctrl-C to abort, almost any other key for status
Xmememeä
Xmememeö
Xmememeü
X22memeä
$ ./john --stdout --incremental --regex='X\0[äöü]' | head -n 4
Press 'q' or Ctrl-C to abort, almost any other key for status
X123456ä
X123456ö
X123456ü
X12345ä

However, I couldn't make --regex=alpha:<subsection> work.

$ ./john --stdout --regex=alpha:leet:pass
alpha:leet:pass
1p 0:00:00:00 0.00% 14.28p/s alpha:leet:pass

All other ttings I tried failed similarily. But it looks like --regex=alpha has been broken for a very long time. I even tried the 1.4.1 rexgen release and old john versions from that time, but didn't succeed in making this work. I know it must have worked in the past, but I am out of ideas of what else to try. (I wanted to find an old version which worked and then use git bisect on both the john repository and the rexgen repository to find out what broke that functionality.)

I think making --regex=alpha work again would be useful, but I'm afraid I can't do it.

If we don't get --regex=alpha working prior to the release, I suggest converting this john.conf line into a comment

.include <regex_alphabets.conf>

and adjusting doc/README.libconfig.

But I would prefer if someone else could look into this and figure out how to fix it.

solardiz commented 3 years ago

If we don't get --regex=alpha working prior to the release, I suggest converting this john.conf line into a comment

.include <regex_alphabets.conf>

and adjusting doc/README.libconfig.

Please assume that this will stay broken and send a PR with the above (or amend your existing one, but have this as a separate commit that we can easily revert in case someone does figure out how to fix this functionality later). Thank you!

frank-dittrich commented 3 years ago

using git log -L:SetupAlpha:src/regex.c I came across

commit 45bbb1689471352bb77ef25f003b4bc6086d7e6e
Author: Frank Dittrich <frank.dittrich@mailbox.org>
Date:   Thu Jun 18 15:04:05 2015 +0200

    regex.c: escape backslash inside words.

    Without that change, we got:
    $ cat test.txt
    a\0b
     $ ./john --stdout --wordlist=test.txt --regex=alpha:case
    buf=[aA]\0[bB]
    aa\0bb
    Aa\0bb
    aa\0bB
    Aa\0bB

So, in April 2015, this must have worked. At that time, the most recent rexgen commit was

commit eafb74d0b6ed35549f98f7bf7b044402ea4e40ff
Author: Jan Starke <jan.starke@outfoebd.org>
Date:   Thu Mar 5 21:00:33 2015 +0100

    add previous() function for ClassRegexIterator

rexgen version at that time was 1.2.3

With my Fedora 32 toolchain I wasn't even able to build that rexgen version.

On super, with the oldest toolchain, I managed to checkout commit 45bbb1689471352bb77ef25f003b4bc6086d7e6e and build john after moving all plugin formats out of the way and disabling cuda.

But trying to build the old librexgen version from march, 2015 on super (with the oldest toolchain) failed

$ ./build.sh 
entering /home/frank/git/rexgen/src/build
running >>> cmake  -DCMAKE_BUILD_TYPE=RELEASE /home/frank/git/rexgen/src <<<
creating rexgen 1.2.3
-- COMPILING OPTIMIZED VERSION: RELEASE
CMake Error at librexgen/CMakeLists.txt:11 (find_package):
  By not providing "FindICU.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "ICU", but
  CMake did not find one.

  Could not find a package configuration file provided by "ICU" with any of
  the following names:

    ICUConfig.cmake
    icu-config.cmake

  Add the installation prefix of "ICU" to CMAKE_PREFIX_PATH or set "ICU_DIR"
  to a directory containing one of the above files.  If "ICU" provides a
  separate development package or SDK, be sure it has been installed.

-- Configuring incomplete, errors occurred!
See also "/home/frank/git/rexgen/src/build/CMakeFiles/CMakeOutput.log".
$ grep -n find_package librexgen/CMakeLists.txt 
9:find_package(BISON 2.3)
10:find_package(FLEX 2.5)
11:find_package(ICU REQUIRED)

I really don't know what else to try.