Open frank-dittrich opened 3 years ago
This helped,
diff --git a/src/librexgen/c/ApiContext.h b/src/librexgen/c/ApiContext.h
index 9885c7e..4c943e7 100644
--- a/src/librexgen/c/ApiContext.h
+++ b/src/librexgen/c/ApiContext.h
@@ -23,8 +23,8 @@
typedef int c_regex_ptr;
typedef int c_iterator_ptr;
-const c_regex_ptr c_regex_none = 0;
-const c_iterator_ptr c_iterator_none = 0;
+static const c_regex_ptr c_regex_none = 0;
+static const c_iterator_ptr c_iterator_none = 0;
$ ./john --stdout --regex='test[12][ab]x?'
test1a
test2a
test1b
test2b
test1ax
test2ax
test1bx
test2bx
@frank-dittrich What do you suggest we do about the regex mode for our next release? I think the options include: drop it, add documentation on known issues and how to overcome them, include a patch against rexgen and documentation on applying it, bundle a version of rexgen in jumbo (in which case we'd end up maintaining it ourselves, but could as well enable it by default), or leave everything as-is.
I suggest to still support it for now, but warn in doc/README.librexgen and configure, that rexgen still contains some serious bugs and that support might have to be dropped in future releases, because rexgen appears to be not longer maintained.
I'll try to come up with appropriate wording, and detailed build instructions.
While in some areas there are improvements
$ ./john --stdout --regex='abc]'
syntax error, unexpected T_END_CLASS, expecting $end
Error, invalid regex expression. John exiting now
$ ./john --stdout --regex='a[bc'
syntax error, unexpected $end, expecting T_END_CLASS
Error, invalid regex expression. John exiting now
There are still some serious bugs.
$ ./john --stdout --regex='a[]'
Segmentation fault (core dumped)
for
$ rexgen 'a[]'
I found a solution which at least avoids the segfault, but rejecting []
as invalid would probably be better than my current 'fix':
$ rexgen 'a[]'
a
My clumsy "fix" breaks output for rexgen '(ab[cde])\1'
see
https://github.com/janstarke/rexgen/issues/65#issuecomment-803439267
(Rather than trying to maintain rexgen, I might try to reject some illegal regular expresions in john's regex.c.)
Another known segfault:
$ rexgen 'a\2'
Now that I managed to build john with rexgen support, I noticed that combining --regex with other modes doesn't work.
$ ./john --wordlist --rules=':lQ' --stdout --regex=alpha:leet
Using default input encoding: UTF-8
syntax error, unexpected $end
Error, invalid regex expression. John exiting now base_word=anthony Regex=
$ ./john --wordlist --rules=':lQ' --stdout --regex='abc\0xyz'
Using default input encoding: UTF-8
Segmentation fault (core dumped)
I need to dig into it to find out whether this has been caused by changes in librexgen or john (or both).
Due to the problems I discovered with trying to make john --regex work with latest librexgen, I closed https://github.com/openwall/john/pull/4642
Based on https://github.com/janstarke/rexgen/commit/5b2f4b159ec948c1f9429eca4389ca2adc9c0b07 (which is the latest commit which allows building librexgen without problems) I prepared a new patch.
Now, all the segfault I got with rexgen 2.1.3 are gone, several things that didn't work with 2.1.3 now work without problems.
No more problems with non-ascii characters (produced a segfault with 2.1.3):
$ ./john --stdout --regex='[äöü]'
ä
ö
ü
No segfaults:
$ ./john --stdout --regex='a\1'
This regular expression has an invalid back reference
Error, invalid regex expression. John exiting now
$ ./john --stdout --regex='[]'
Error, invalid regex expression. John exiting now
$ ./john --stdout --regex='[~- ]'|wc -l
Press 'q' or Ctrl-C to abort, almost any other key for status
95p 0:00:00:00 0.00% 1900p/s
95
In some cases, there isn't any error message providing details about illegal regular expressions, but IMHO that's preferable to a librexgen version causing segfaults:
$ ./john --stdout --regex='a]'
Error, invalid regex expression. John exiting now
$ ./john --stdout --regex='[a'
Error, invalid regex expression. John exiting now
Usage of '\0' in combination with --wordlist, --prince, --incemental also works:
$ ./john --stdout --wordlist=password.lst --regex='X\0[äöü]' | head -n 6
Using default input encoding: UTF-8
Press 'q' or Ctrl-C to abort, almost any other key for status
X123456ä
X123456ö
X123456ü
X12345ä
X12345ö
X12345ü
$ ./john --stdout --prince --regex='X\0[äöü]' | head -n 4
Press 'q' or Ctrl-C to abort, almost any other key for status
Xmememeä
Xmememeö
Xmememeü
X22memeä
$ ./john --stdout --incremental --regex='X\0[äöü]' | head -n 4
Press 'q' or Ctrl-C to abort, almost any other key for status
X123456ä
X123456ö
X123456ü
X12345ä
However, I couldn't make --regex=alpha:<subsection>
work.
$ ./john --stdout --regex=alpha:leet:pass
alpha:leet:pass
1p 0:00:00:00 0.00% 14.28p/s alpha:leet:pass
All other ttings I tried failed similarily.
But it looks like --regex=alpha has been broken for a very long time.
I even tried the 1.4.1 rexgen release and old john versions from that time, but didn't succeed in making this work. I know it must have worked in the past, but I am out of ideas of what else to try.
(I wanted to find an old version which worked and then use git bisect
on both the john repository and the rexgen repository to find out what broke that functionality.)
I think making --regex=alpha
work again would be useful, but I'm afraid I can't do it.
If we don't get --regex=alpha
working prior to the release, I suggest converting this john.conf line into a comment
.include <regex_alphabets.conf>
and adjusting doc/README.libconfig.
But I would prefer if someone else could look into this and figure out how to fix it.
If we don't get
--regex=alpha
working prior to the release, I suggest converting this john.conf line into a comment.include <regex_alphabets.conf>
and adjusting doc/README.libconfig.
Please assume that this will stay broken and send a PR with the above (or amend your existing one, but have this as a separate commit that we can easily revert in case someone does figure out how to fix this functionality later). Thank you!
using git log -L:SetupAlpha:src/regex.c
I came across
commit 45bbb1689471352bb77ef25f003b4bc6086d7e6e
Author: Frank Dittrich <frank.dittrich@mailbox.org>
Date: Thu Jun 18 15:04:05 2015 +0200
regex.c: escape backslash inside words.
Without that change, we got:
$ cat test.txt
a\0b
$ ./john --stdout --wordlist=test.txt --regex=alpha:case
buf=[aA]\0[bB]
aa\0bb
Aa\0bb
aa\0bB
Aa\0bB
So, in April 2015, this must have worked. At that time, the most recent rexgen commit was
commit eafb74d0b6ed35549f98f7bf7b044402ea4e40ff
Author: Jan Starke <jan.starke@outfoebd.org>
Date: Thu Mar 5 21:00:33 2015 +0100
add previous() function for ClassRegexIterator
rexgen version at that time was 1.2.3
With my Fedora 32 toolchain I wasn't even able to build that rexgen version.
On super, with the oldest toolchain, I managed to checkout commit 45bbb1689471352bb77ef25f003b4bc6086d7e6e and build john after moving all plugin formats out of the way and disabling cuda.
But trying to build the old librexgen version from march, 2015 on super (with the oldest toolchain) failed
$ ./build.sh
entering /home/frank/git/rexgen/src/build
running >>> cmake -DCMAKE_BUILD_TYPE=RELEASE /home/frank/git/rexgen/src <<<
creating rexgen 1.2.3
-- COMPILING OPTIMIZED VERSION: RELEASE
CMake Error at librexgen/CMakeLists.txt:11 (find_package):
By not providing "FindICU.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "ICU", but
CMake did not find one.
Could not find a package configuration file provided by "ICU" with any of
the following names:
ICUConfig.cmake
icu-config.cmake
Add the installation prefix of "ICU" to CMAKE_PREFIX_PATH or set "ICU_DIR"
to a directory containing one of the above files. If "ICU" provides a
separate development package or SDK, be sure it has been installed.
-- Configuring incomplete, errors occurred!
See also "/home/frank/git/rexgen/src/build/CMakeFiles/CMakeOutput.log".
$ grep -n find_package librexgen/CMakeLists.txt
9:find_package(BISON 2.3)
10:find_package(FLEX 2.5)
11:find_package(ICU REQUIRED)
I really don't know what else to try.
Latest rexgen doesn't even compile anymore (on Fedora 32), and it looks like https://github.com/janstarke/rexgen hasn't been maintained for almost 2 years.
To be able to compile latest rexgen, I applied this change:
Apparently, applying https://github.com/janstarke/rexgen/pull/64 on top of https://github.com/janstarke/rexgen/commit/585e86da6a07a63ec7a871320b307457661d557c also helps.
The
c_regex_cb_mb
interface has been changed, and I had to make 2 warnings disappear.The
c_regex_cb_mb
interface has been changed somewhere in the middle of either the 2.0.8 or 2.0.9 release. So we also might need to adjust these 2 lines, changing0x020006
to0x020103
(or may be0x020009
):and doc/README.librexgen also still mentions
Unfortunately, due to these /usr/local/include/librexgen/c/ApiContext.h lines
build fails like this:
I have to give up now, did already spend more time on this than I intended to, and I don't know when I'll have time to continue.
Too bad that rexgen isn't maintained any longer. For certain tasks I loved to use john's --regex mode.