openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
9.98k stars 2.06k forks source link

Lift the 2 GB file size limit from wordlist mode with 32-bit arch. #600

Closed magnumripper closed 10 years ago

magnumripper commented 10 years ago

For a starter, that "ftell() returning negative" test should only disable memory buffering (and likely mmap() too unless it's rewritten to use a window within the file), not make John bail out with error!

magnumripper commented 10 years ago

See also #599

jfoug commented 10 years ago

There are many problems with files > 2gb when it comes to not having 64 bit file support. Resumes, saves, seeks, cores, infinite loops. Lots of fun. This WAS something I wanted to work around, but have not gotten to. I agree the bail is not good, but until we get past the 31 bit seek limitations, working with files larger than that, is NOT good.

As a side note, the logic totally sux. If you have a file that is 4gb+1 byte, jtr will work with it just fine. Ugly, very ugly. But that is the problem with the 'normal' f*() funcitons (fseek, fread, fwrite, etc). There are very useful, BUT since they were written to use longs, they are not instantly portable.

magnumripper commented 10 years ago

We could check for HAVE_FGETPOS and HAVE_FSETPOS and use them if available, but they use an opaque struct. What good is that?! Silly crap if you ask me.

magnumripper commented 10 years ago

Hmm or we could check for HAVE_FSEEKO and HAVE_FTELLO and use them if available - this might actually be a solution. On all fully 64-bit systems including Win64, off_t has got to be signed 64-bit. The black sheep X32 would still not work reliably beyond 2 GB but we can easily ignore that.

jfoug commented 10 years ago

I think the solution is to have our own jtr_fseek() jtr_ftell() (and possibly for fread/fwrite). The we write them to be 64 bit. Now, on a 'native' 64 bit box that is simply done using #defines to the fseek. On win32, we 'might' be able to do this with defines to the 64 bit variety (such as _ftell64), or we might have to make small wrappers.

Doing a small 'wrapper' function method like that, is much less prone to errors, than putting all sorts of complex conditional compilation logic, and scattering i wherever we read/write/seek.

Just my $0.02 worth of ideas.

jfoug commented 10 years ago

http://msdn.microsoft.com/query/dev10.query?appId=Dev10IDEF1&l=EN-US&k=k(%22MMAP-WINDOWS%2f_FTELLI64%22);k(_FTELLI64);k(DevLang-%22C%2B%2B%22);k(TargetOS-WINDOWS)&rd=true

_ftelli64() I will see if this is available under cygwin (I bet it is), and I am 95% sure it would be under mingw0. Functions like this should allow standard 64 bit ops for Win32. Everything falls down to CreateFile() and ReadFile() anyway (Win32 API calls), and that is fully 64 bit capable, and has been forever. It simply is the fallback to the 1970's C, which cripples fseek() ftell() to use the native long (signed) of the machine.

magnumripper commented 10 years ago

This is a good plan. We just need to ensure compatibility with core - and it only has to be compatible from core -> Jumbo, not the other way round. Core will write rec_pos as %ld and if we read it as %lld (depending on OS) it will work just fine with no kludges.

I'm not a fan of thousands of tiny source files. Would placing this in misc.c be sensible?

magnumripper commented 10 years ago

Would placing this in misc.c be sensible?

No it wouldn't. That would mean we'd create a long dependency chain (for external helper tools) to logger.o, options.o, memory.o and so on. So maybe we introduce jumbo.h. I always wanted to move some stuff from misc.h to a new file that creates NO further dependencies. Let's call it jumbo.h (and jumbo.o). We can move some other stuff to it, for example the jtr_memmem stuff (which should be reverted to NOT inline in the header), the jtr_basename() and other Jumbo things. I'll start this work today and (likely) leave the Windows stuff to you.

jfoug commented 10 years ago

I have VC sort of working. The ETA appears to be busted, possibly lost due to mod(2^32). Cygwin I thought was working, but it only processed mod(2^32). That is better than < 2^31, but not 64 bit compatible yet. I will test cyg64 later.

VC and 64 bit comiplers (where sizeof long ==8) are handled simply with defines in a header file (and switching the seek/tell longs with ARCH_WORD_64's). Cygwin I was trying ftello and fseeko They are there, but appear to possibly not work. I also had to make a change to mmap function return. Cygwin was returning -1. I will dig into this a bit more, but I think this might be able to be worked around. It looks like I will have to get a real mingw env going again, to try to get it working also.

The default for 32 bit, if I do not have it, is to simply fall back to fseek and ftell, so they likely will still end up with 2^31 failures.

magnumripper commented 10 years ago

My current experimental version is using defines only, eg:

+#if SIZEOF_LONG < 8 && HAVE_LSEEK64 /* Linux 32-bit or X32 */
+
+// off64_t lseek64(int fd, off64_t offset, int whence);
+#define jtr_fseek64(stream, offset, whence) \
+       lseek64(stream, offset, whence)
+
+#elif SIZEOF_LONG < 8 && HAVE_FSEEK64 /* Various */
+
+// int fseek64 (FILE *stream, long long offset, int whence);
+#define jtr_fseek64(stream, offset, whence) \
+       fseek64(stream, offset, whence)
+
+#elif SIZEOF_LONG < 8 && HAVE__FSEEKI64 /* Windows */
...

Is this a bad idea? The idea is that we always use int64_t to use it.

magnumripper commented 10 years ago

I had feature creep so I can't commit anything yet, lol. I moved jtr_memmem() and jtr_basename from misc.c to this new file jumbo.h (and jumbo.c).

jfoug commented 10 years ago

I wish you would not have done this moving yet. It increases the complexity of working in the same area.
Some of what I have been working on is similar to what you list. It is being done differently, to keep the Makefile.orig 'working'. But there is also quite a bit of other changes, in things like loader, cracker and wordlist (mostly wordlist).

There was also an fstat that had to be scrapped. Instead of fstat, I save current pos, seek to end, ftell, seek back. Same end result BUT it works properly for 64 bit.

For VC, the usage of _ftelli64 and _fseek64 (with proper 64 bit vars), works fine. I am not yet getting success on cygwin using ftello and fseeko. They are 'supposed' to be 64 bit, but are not. For mingw, I know there has been question asked about supporting all of the stdio stuff fron VCCRT (like _ftelli64). I am not sure if that is there or not.

Jim.

magnumripper commented 10 years ago

I committed to a topic branch (jumbo-h) for review.

magnumripper commented 10 years ago

I wish you would not have done this moving yet. It increases the complexity of working in the same area.

But I told you I was going to do it :laughing: Anyway, I do not have to commit my work at all, it wasn't a lot of work. Either I merge the jumbo-h branch and you change it to your needs (if at all needed), or you commit yours and I change/add things in that. So please review.

magnumripper commented 10 years ago

I also had to make a change to mmap function return. Cygwin was returning -1. I will dig into this a bit more, but I think this might be able to be worked around. It looks like I will have to get a real mingw env going again, to try to get it working also.

Ouch, turns out I had a bug in wordlist.c!

diff --git a/src/wordlist.c b/src/wordlist.c
index 1c4a52b..c5b1bf1 100644
--- a/src/wordlist.c
+++ b/src/wordlist.c
@@ -577,10 +577,11 @@ void do_wordlist_crack(struct db_main *db, char *name, int rule
                mem_map = mmap(NULL, file_len,
                               PROT_READ, MAP_SHARED,
                               fileno(word_file), 0);
-               if (!mem_map)
+               if (mem_map == MAP_FAILED) {
+                       mem_map = NULL;
                        log_event("- memory mapping failed (%s) - but we'll do"
                                  "fine without it.", strerror(errno));
-               else {
+               } else {
                        map_pos = mem_map;
                        map_end = mem_map + file_len;
                        map_scan_end = map_end - 16;

I will commit this immediately.

magnumripper commented 10 years ago

I am not yet getting success on cygwin using ftello and fseeko. They are 'supposed' to be 64 bit, but are not.

On Linux and OSX, certain macros need to be defined for 64-bit support on 32-bit systems. Maybe that's the problem on Cygwin too? Or didn't it work even on Cygwin64?

# TODO: These should be conditioned per OS:
# Linux
AS_IF([test "x$ac_cv_func_lseek64" = xyes], [JTR_LIST_ADD(CFLAGS_EXTRA, [-D_LARGEFILE64_SOURCE])])
# OS X
AS_IF([test "x$ac_cv_func_fseeko" = xyes], [JTR_LIST_ADD(CFLAGS_EXTRA, [-D_DARWIN_C_SOURCE])])

The experimental branch now works on OSX 32-bit and 64-bit (well it builds - I need to test it too ;-)

magnumripper commented 10 years ago

I rebased the john-h branch, so whatever commit hashes I posted earlier are no longer valid. Currently a2bafa9 is the one commit upon bleeding-jumbo. (Edit: another one added with slight fixes :)

jfoug commented 10 years ago

I will see if I can test against your changes. I do have this working, but still require more testing. There were quite a few changes to wordfile.c (and a couple others). There still are some 32 bit issues (like mmap), but I have handled that also. One thing of note (NOTE, I have not looked at your code yet), lseek64/fseek are not compatible without fileno() ;)

NOTE, there will also be a .c file. I HAD to write the 64 bit crap myself for cygwin. Fuk'er only build that for 64 bit systems EVEN THOUGH win32 provides them all the tools they need to do it. DAMN them. There are a couple design items that just piss me off. This is one of them. The other is not defining CYGWIN32 or CYGWIN64 within cyg64. But I have it working. I had to drop back to Win32 API to get the file size, there was NO help from cygwin32.

magnumripper commented 10 years ago

lseek64/fseek are not compatible without fileno()

That's an easy fix. I have them as:

#if SIZEOF_LONG < 8 && HAVE_LSEEK64 /* Linux 32-bit or X32 */

// off64_t lseek64(int fd, off64_t offset, int whence);
#define jtr_fseek64(stream, offset, whence) \
    lseek64(fileno(stream), offset, whence)

I HAD to write the 64 bit crap myself for cygwin

I can't see why you couldn't just use the Windows functions from Cygwin? I have them as:

#elif SIZEOF_LONG < 8 && HAVE__FSEEKI64 /* Windows */

// int _fseeki64(FILE *stream, __int64 offset, int origin);
#define jtr_fseek64(stream, offset, whence) \
    _fseeki64(stream, offset, whence)

That's completely untested though...

magnumripper commented 10 years ago

I will see if I can test against your changes. I do have this working, but still require more testing

We'll probably end up wanting to cherry-pick bits from your version and other bits from mine. You could push it to your own topic branch git push origin HEAD:jim-fseek64 or something, for now. Or maybe you just push your version right into bleeding after testing, I can add some of my bits if at all needed.

jfoug commented 10 years ago

I can't see why you couldn't just use the Windows functions from Cygwin? I have them as:

Because they are NOT there. These are not exported by cygwin. They are in the crtl dll, and cygwin does not use that. Mingw does, BUT it still has to export things from there to use. I am crossing fingers there.

For cygwin (both 32 and 64), the 'goodie' functions are not exported. however, cyg64 fseek/tell work.

I am in the process of TRYING to get things merged done, with a 3-way against what I have, your jumbo-h and jumbo. But with all the activity, it is not easy. The change is not super trivial.

magnumripper commented 10 years ago

Because they are NOT there. These are not exported by cygwin. They are in the crtl dll, and cygwin does not use that.

I'm probably just stupid, but... why does Cygwin have to be involved in that, given you know the function and the DLL? Can you not link against libraries outside Cygwin?

I am in the process of TRYING to get things merged done, with a 3-way against what I have, your jumbo-h and jumbo. But with all the activity, it is not easy. The change is not super trivial.

It might be much easier for me to merge them, using the power of git. BTW you need to ignore all changes regarding ALIGN in that branch - they are in the main tree already. They were included by mistake as I thought it had something to do with my changes.

jfoug commented 10 years ago

But then you have to fully address whether the dll is there, if it is new enough, etc. I wrote them myself. They are not 'too' hard. The hardest part was getting the length of the damn file, for the SEEK_END computation.

I have checked this all in. Within jumbo.h, in the function 'finders', I do NOT allow any else to be entered, UNLESS it is going to satisfy the 64 bit requirements. I had to put fpos_t into configure.ac checks. I put all of the lseek() functions last. Those are the ones that are most likely to have problems. I THINK we are safe, but not 100% sure. Those functions may not always play right, with a FILE * handle, if the file was opened in writeable mode. the l() functions are non buffering, while the f*() are buffering. I think we are safe. NOTE, we do have some seeking on writable files (the .pot reload HAD to also work 64 bit), and it seeks, and worse yet, there are multi threads using fwrite() . That one was may want to be VERY careful with.

I also put your linux and darwin special flags into the targeted m4 macro file, AND use $host_os so that I do not get splattered with $_DARWIN_C whatever macros.

I also put all the 'safe' already 64 bit replacements first. If sizeof(long) is bit enough, then there is no need to worry about the other horse-crap functions. fseek() and ftell() are happy to give you > 32 bits.

Please check things over.

I get a log of %lld wants long long, but we have int64_t I am not sure how to quiet that. They ARE the same type, it is just the compiler giving red-herring warnings.

Also, the _LARGE_FILE stuff on linux, probably needs to be ONLY put on the 32 bit stuff. Right now it is only linux for the host_os. Same is likely the case for darwin. Can you look at these? They are in jtr_systems_specific_logic.m4

magnumripper commented 10 years ago

Reviewed it, looks solid. But I get a regression that I have yet to find the reason to:

In file included from john.c:104:0:
cuda_common.h:15:26: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
make[1]: *** [john.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [default] Error 2

At first I thought it was because JTR_SYSTEMS_SPECIFIC_LOGIC moved down but that can't have anything to do with it. I just can't see ANYTHING that could have caused this! Ah well, continuing my search...

magnumripper commented 10 years ago

Got it. Trivial in hindsight: CFLAGS was not backed up in beginning of JTR_SYSTEMS_SPECIFIC_LOGIC but they were restored afterwards :)

magnumripper commented 10 years ago

Good stuff, closing!

jfoug commented 10 years ago

The JTR_SYSTEMS_SPECIFIC_LOGIC was moved down, so that all lib, function, header checks were done. That way, we KNOW the configuration is setup, and we are just doing some of the 1-off 'specialty' shit (like cygwin64, extra flags that are unique to a system etc).

We 'might' want to have a JTR_SYSTEMS_SPECIFIC_PREP logic. That would be for certain things like CFLAGS that are REQUIRED to build at all. Probably would be called right after we get the target tripplet. I do not have any reason for doing that now, but I bet there are systems which we NEED to have something early on. I know ./configure does a lot of this. Possibly it does enough that we do not have to worry about it at all.

NOTE, I did change file modes, and also changed the linux/darwin logic to only add large file support to 32 bit builds. Would have make sure this is OK for darwin. I am 99 % sure it is fine for linux. The ftell/fseek will be used by both of these, if built for 64 bit, so I really think addint that flag is not a good thing.

magnumripper commented 10 years ago

For OSX the same macro might eventually be wanted for other stuff to but we'll extend it if/when we see that. For Linux it should be a no-op on 64-bit but it's nice not having unneeded defines in the dynamic Makefile.

jfoug commented 10 years ago

Side Note:

You had started using SIZEOF_OFF_T You can not do that, without making sure they are already one of the types we check size of within configure.ac, OR that they get set from some other code somewhere. I did add this type (and also SIZEOF_SIZE_T, since I needed that for mmap checking if it is limited to 32 bit). I know you complained about having too many of these up front, but there IS a reason why ;) It would be great if cpp could do things like #if sizeof(int)==4 but it can not.

We might want to proactive add some additional XXXX_T sizes. However, I wonder how configure works if XXXX_T is not defined? Also, we may have to do header includes on some of them, to get the type set. I think there is a param, listing what extra headers to include, I will have to look.

Jim.

jfoug commented 10 years ago

For Linux it should be a no-op on 64-bit but it's nice not having unneeded defines in the dynamic Makefile.

I did get some re-defines, since I think Jtr has code that defines it (yuck)

magnumripper commented 10 years ago

I was going to check if our local stdint.h adds off_t but I forgot about it.

magnumripper commented 10 years ago

I was going to check if our local stdint.h adds off_t but I forgot about it.

It doesn't. keepass2john and rcf2john use off_t. I need a long break now so that's a todo.

jfoug commented 10 years ago

I am going to do some tests with the AC_CHECK_SIZEOF stuff, seeing what happens under 'bad' conditions, and seeing if we need to work around any of them.

'Using' the results of the AC_CHECK_SIZEOF macros, is simply taking time to put it into code. Often we do not care at all about the size. BUT recently we added 64 bit file sizes. Well, the mmap uses SIZE_T for it's size. That's 32 bit, so I was getting truncation problems for cygwin. Took a while to find it. The damn compiler should have bitched up a storm about possible truncation, but not a peep. It is 'fixed' now, but there could be other areas where we truncate. We just have to be careful, and slowly start to make this more and more portable. 64 bit portability on a 32 bit system is NOT an easy thing to do. As far as we have come with the 64 bit file (on 32 bit hw), is pretty wonderful. I wish memory expansion was this 'easy', lol. There 'are' ways to go past 32 bit limit on 32 bit builds, BUT it is not globally available (I can do it on Win32). Also, it has limitations, AND is ugly i.e. back to DOS segmented methods. Do you remember LARGE model, and HUGE model from the Borland DOS days, lol. It 'can' be done in 32 bit windows (segments and memory mapped files AND the memory is not directly accessable), but is NOT pretty, and I am NOT going to do this, it simply is NOT worth the effort.

jfoug commented 10 years ago

Sorry, I did not see this as a closed issue.

magnumripper commented 10 years ago

Sorry, I did not see this as a closed issue.

Oh, we can continue commenting here, closing it just means we're sort of done with the main task. A lot of testing does remain, but any problems found should be entered as new issues (or just fixed immediately).

Yeah we could memory map a window, but it definitely wouldn't be worth the effort. It's history. Anyone doing real cracking should run a 64-bit build anyway. Just like Big Endian, all 32-bit work I do in JtR are just so I don't forget how to do it :-)

jfoug commented 10 years ago

Lol, after writing my own fseeko64/ftello64 for cygwin, after a build (and your build-list changes), I get this:

$ ../run/john --list=build-info Version: 1.8.0.2-bleeding-jumbo Build: 32-bit XOP-autoconf Arch: 32-bit LE $JOHN is ../run/ Format interface version: 12 Max. number of reported tunable costs: 2 Rec file version: REC4 Charset file version: CHR3 CHARSET_MIN: 1 (0x01) CHARSET_MAX: 255 (0xff) CHARSET_LENGTH: 24 Max. Markov mode level: 400 Max. Markov mode password length: 30 Compiler version: 4.8.2 gcc version: 4.8.2 OpenSSL library version: 01000107f OpenSSL 1.0.1g 7 Apr 2014 GMP library version: 6.0.0 NSS library version: 3.15.3.1 NSPR library version: 4.10.3 fseek(): fseeko ftell(): ftello memmem(): System's

And the fseeko/ftello work just fine. Oh well. I will either pull the code, or leave it as a framework for what will be needed if we have a system where we do not have 64 bit file operations, and have to write our own from scratch.

jfoug commented 10 years ago

This 'may' still be open. On sparc32, I may need to also modify the fopen, and use fopen64 instead. I am still working through this, with a small test case.

It looks like fopen64, and ftello64/fseeko64 appear to work with large files, fgets does the reading. BUT when I run jtr, I am getting no wordlist loaded. I still need to investigate more and figure this one out.

But I may have to add a jtr_fopen to the jumbo.h, and possibly use fopen64. I just do not have all the information for this one, yet.

jfoug commented 10 years ago

I got sparc working properly now. 3 things.

  1. needed to override fopen with fopen64
  2. needed to REMOVE the _POSIX_SOURCE in wordlist.c (I put a #ifndef sparc around it)
  3. needed to add O_LARGEFILE to the open in cracker.c (pot_fd)

For #1, a. I added fopen64 and _fopen64 to the functions in configure.ac b. in jumbo. if sizeof(long) is 8, then fopen=fopen. else if defined fopen64 then fopen=fopen64, else _fopen64 ... else fopen=fopen c. added fopen to the build-info output.

for #3, a. in jumbo.h if O_LARGEFILE is not defined, I define it to 0 b. in cracker.c I split the open statement across lines. c. Then I put a #if bits==32 | O_LARGEFILE #endif to add that flag to 32 bit builds.

I think this is it for sparc.

I need to check things a little more, but I think this gets sparc working properly.

jfoug commented 10 years ago

Pushed as 98c1444 Please review.

jfoug commented 10 years ago

Keeping this open until magnum has a chance to test this on some of his other 32 bit systems.

magnumripper commented 10 years ago

Several bugfixes, of which 27634f3f was a real blocker. Without that bugfix, it simply could not work at all for several of the alternatives. Some feature macros added. Tested on Sparc32, Linux 32/64/X32.