-fork may be borking .pot file (and other FILE io) on jumbo.

jfoug commented 9 years ago

Issue found with TS. Started as a stdout only problem, but appears to be .pot file being truncated when -fork is being used.

a fflush() was needed for the stdout problem. That may be the issue needed on all FILE io when in fork mode. Time will tell.

This is an example showing the symptom:

$ ./jtrts.pl -b=/JtR/bleed/run -passthru=-fork=2
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-jumbo-1-bleeding_omp_memdbg_asan [linux-gnu 64-bit SSE4.1-autoconf]
--------------------------------------------------------------------------------

John Jumbo build detected.

form=dynamic_0                    guesses: 1502 0:00:00:00 DONE  [PASSED]
.pot CHK:dynamic_0                guesses: 1500 0:00:00:00 DONE  [PASSED] (1500 val-pwd)

.....
form=dynamic_15                   guesses: 1500 -show= 752 0:00:00:03 DONE : Expected count(s) (1500)  [!!!FAILED!!!]
.pot CHK:dynamic_15               guesses:  752 -show= 752 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED!!!]  (752 val-pwd)

In this case, the -show only gave us 752, which was what was in the .pot file. Upon the .pot check, it also shows us that only 752 lines were in the .pot, and on re-run john worked with them, so this does not appear to be any 'bug' in the test suite. This only appears to happen in -fork mode.

jfoug commented 9 years ago

Right now, I am actually having a very hard time (with 'simple' script) to find the problem. I made a loop in my flock-tst script, and am simply looping until it fails, I am up to 100 runs (dyna_1 test again)

But running the TS, I was able to 'finally' have it chop the .pot file in half (2 forks). That was the only failure I have seen. This (along with fsync), is certainly much harder to corrupt. But that could also be because things have been slowed down. Hard to say for sure, unless I can comment out new stuff, rebuild, and have it go to hell again.

jfoug commented 9 years ago

It may be the fsync() that is making it hard to show this problem. I will build without that.

Nope at loop 31, it failed. failed again at loop 9.

So without the open() / close() / open() it is easy to fail. I have rebuilt, and will continue testing, to see if I can get this 'simple' script to fail.

jfoug commented 9 years ago

Ok, I have done 4 runs of 75 loops, with open()/close()/open() and I am not able to get failures. I do think this is the underlying FS simply not providing a file inode (or whatever it is) to other processes quickly enough.

Now why core works, I do not know.

magnumripper commented 9 years ago

OK try this then: With 09b03324 you can completely disable pot resync just editing john.conf. Set all of them to N and see if it still behaves different than core. It really should not.

ReloadAtCrack = N
ReloadAtDone = N
ReloadAtSave = N

magnumripper commented 9 years ago

Disregarding the pot sync and reviewing a git diff origin/master logger.c, I see one slight difference between jumbo and core: In jumbo we do an actual lseek(fd, 0, SEEK_END). Core doesn't, since the file is opened in O_APPEND anyway. Maybe this triggers some weird bug somewhere.

magnumripper commented 9 years ago

In jumbo we do an actual lseek

...and if you turn pot sync off in john.conf, you could try to comment out both lseek's just for ruling them out.

magnumripper commented 9 years ago

Hmm wait a minute. Yes, we are opening with O_APPEND. So regardless of whether we seek or not, and wherever we seek to, and even without support from any file locking(!), the write is supposedly guaranteed to be atomically appending. Now THAT make your problems indicate a severe bug, wherever it actually lives (I still guess it's in your vboxfs but I have no idea how these low level things are implemented).

I always wondered why Solar doesn't even emit a warning if OS_FLOCK is not there. This is why.

magnumripper commented 9 years ago

I jumped the gun there. From some Linux man page: _OAPPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.

"NFS file systems" can of course be replaced with "some file systems". It's a good idea using O_APPEND backed up with flock() but apparently it's not enough in this case.

jfoug commented 9 years ago

I am seeing the open()/close()/open() work wonders.

Now, the TS is somewhat of a 'ugly' beast. It does some things which are not always what you would 'see' in the wild. One of these, is to always start off with an empty .pot file. Many times (probably most of the time), jtr runs go where there is an existing john.pot (or whatever .pot file name) already on disk. I am going to make some changes to the TS to make sure that the .pot file is already there, written and flushed to disk. Now, the file will be a zero byte file, which on nix (and ntfs, I think), is just an inode. I will back out the open()/close()/open() code back to just open, and make sure that the fsync is not there, and start testing.

But I have been testing with existing open()/close()/open() and it is pretty damn stable. Yes, I did see one 1/2 sized .pot file, but only one. I Have not been able to make it fail since then. But if I remove that open()/close()/open(), it fails quickly, at a somewhat scary percentage. Again, the bug as I see it, is that both processes get that file handle, but there is no dependance at the OS level between them. With the file there, that problem appears to go away. That one failure could still be deemed to be some race condition, and I have not been able to replicate it. Having the file already there, prior to either JtR running might solve this.

_If that is the case, then we might be able to handle this with warning documentation within the -fork section, OR by having JtR create the .pot file early_, simply an open(create|append);close() during the john_init() or sometime like that. to make sure the file is there, prior to ever getting to any forking code.

I am also going to open this back up on cygwin and see if it works.

jfoug commented 9 years ago

Ok, I made this change to the TS. After every unlink($pot), except when at the bottom of the loop, and we REALLY want to clean it up, I call this sub

sub create_file_if_not_exist {
    my $filename = $_[0];
    if (-e $filename) { return; }
    open(FILE, ">".$filename);
    #print FILE "\n";
    close(FILE);
}

Now, the ubuntu-64 build, on the vm, using the vboxfs location of the test suite is running fine. I am going to build Cygwin with fork enabled, and see if now works in the TS. The reason we stopped using it before was it was having all sorts of problems. Possibly THIS is it. If not, it gives me more things to look into.

This is NOT the fix. It is just a work around to a newly discovered (we hope) problem on some non ext4-fs systems. Also, this likely is not the ONLY issue. There may be other systems we do not have our hooks into that have other issues with fork. Again, _the 'FIX', may be documentation, or it may be JtR always creates and closes the .pot file early on every run, or a combo of both._

OK, this did pop up in the ubuntu run with the changed TS. I have seen a couple of these, and I do not know exacty what the TS is complaining about.

form=dynamic_66                   guesses: 1500 0:00:00:07 DONE  [PASSED]
.pot CHK:dynamic_66               guesses: 1500 -show=1500 0:00:00:03 DONE : Expected count(s) (1500)  [!!!FAILED!!!]  (1500 val-pwd)

form=dynamic_77                   guesses: 1500 0:00:00:14 DONE  [PASSED]
.pot CHK:dynamic_77               guesses: 1500 -show=1500 0:00:00:07 DONE : Expected count(s) (1500)  [!!!FAILED!!!]  (1500 val-pwd)

Here everything on screen print checks out, but the TS still complains. I see these every once in a while and I am not sure why. I need to change my !!!FAILED!!! to be !!!FAILED1!!! !!!FAILED2!!! etc so I know at least what statement failed them. Here i have a pretty good idea, since the (1500 val-pwd) is listed, and I think there is only 1 failure point with that on it. Something in the if statement is not 'happy', BUT is not printed on screen.

jfoug commented 9 years ago

Also note, the .pot file may not be the ONLY culprit here. Any files written to during the fork may behave like this. The .pot, the .log, etc. I am not sure about session files. Do we have a single one, or does each process get it's own? If a singleton, then yes, the session files also.

jfoug commented 9 years ago

Ok, for cygwin, if I run it, it will start fork=2 processes, but they just hang. If i kill one of of them, the other proccess will complete. the .pot file will contain only the hashes cracked by that half (assuming fork=2). Also, the tty is hosed, and i have to run stty echo to get any screen IO (happens sometimes on JtR screw ups where screen is not released from raw mode).

So cygwin bug was not this one (goes beyond). But I may look into that a bit and see if it is some not 'too' hard. Likely it is both processes deadlocked on something (possibly the flock). Tried re-running with LOCK_DEBUG and I get no extra output, so cygwin may be deadlocked prior to getting to the .pot file stuff.

jfoug commented 9 years ago

@magnumripper how does rec_save type stuff work for forked processes (recovery record crap) ?

Ok, tested on ubuntu (working), and i see this:

$ ls -l xxxx.*
-rwxrwx--- 1 root vboxsf  189 Feb  6 08:47 xxxx.2.rec
-rwxrwx--- 1 root vboxsf 1711 Feb  6 08:47 xxxx.log
-rwxrwx--- 1 root vboxsf 1106 Feb  6 08:47 xxxx.pot
-rwxrwx--- 1 root vboxsf  188 Feb  6 08:47 xxxx.rec

So I see xxx.rec and xxx.2.rec That answers my question. But I have been slowly drilling down in cygwin, and it may be in the record save that cygwin is having problems. There is a special ftruncate code in there. I am not sure if that is the problem or not.

jfoug commented 9 years ago

Why do we do lseek(end) on the .pot file, but NOT on the log file??? Is that simply for that 'pot reload' logic??

jfoug commented 9 years ago

On my ubuntu system, not having _john.log_ there, ALSO _triggers the same disjoint file handle_ for locking the log file. NOTE on this run, neither file existed. Here is results after. The .pot should be 1500 lines, the .log 15xx (about 1520 or so I think).

$ wc xxxx.*
  1355   6827  51902 xxxx.log
  1500   1639  84430 xxxx.pot
  2855   8466 136332 total

I am going to make a 'simple' script and start looking for this one also.

As for cygwin, I have made progress, but I am about ready to throw in the towel. Now I have it working (for a while), but it seems to deadlock often in the flock(LOCK_UN) call, or possibly some race condition between 2 processes doing flock(LOCK_UN)/flock(LOCK_EX). I do have some simplification to the log_flush function however, and will be getting that checked in, once I have it tested better. I removed some of the if(fd==&pot) stuff, and simply always print stuff (to debug the true calls to that locking stuff). Also, the crk_pot_pos logic is simplified (now that I understand just wtf it was used for). We still do a SEEK_END on the .pot (but not .log). I am not 100% sure WHY we do that. I really think that should be a SEEK_CUR, and allow the append mode logic to work, BUT you mentioned that something about NFS having issues. BUT if that is the case, then I would think we would ALWAYS SEEK_END, both for the .pot and .log file ??? Rereading that, it appears that possibly using APPEND open with flock 'may' allow the system to simulate proper appending (which was what you found out about corruption).

All this is hard to tell, BUT overall, this is an exercise we have needed to do, trying to figure out how better to handle this. I still may not be able to get cygwin working (I have somewhat given up). BUT this is also testing a lot with the files not already existing on disk. Once i get my script updated (to look at both log and pot file line counts), I will also see if cygwin will work, if the files are 'always' there and flushed to disk, prior to jtr running. _THAT_ has pretty much eliminated the problem on the vboxfs for ubuntu running on virtualbox. I still get an error now and again on the TS, BUT those errors are 'invalid' password, and I think what is happening is something in screen IO is getting corrupted in the stderr stuff, triggering a 'looks like' situation for a password. BUT every time I get one of those errors, there also are 1500 GOOD matching passwords (or the correct amount), so this 'bad password' is spurious from somewhere else.

jfoug commented 9 years ago

Ok, here is my 'test' script (to run without worry about JtRTS crap)

#!/bin/bash
CNT=0
CNT1=1
while [ true ]
do
Err=""
rm bug.*

# these 2 lines are CRITICAL on vboxfs file system (virtual box writing to native NTFS disk)
# this makes sure there ARE both of these files, before JtR starts.
echo -n > bug.pot
echo -n > bug.log

/JtR/bleed/run/john -fork=2 -w=pw.dic dynamic_1_tst.in -pot=./bug.pot -ses=./bug -form=dynamic >./bug.stdout 2>./bug.stderr

RET=$(wc bug.pot | awk '{print $1;}')

if [ x$RET != x1500 ] ; 
then
    Err="failed pot $RET"
else
RET=$(wc bug.log | awk '{print $1;}')
if [ x$RET != x1520 ] ; 
then
    Err="failed log $RET"
fi
fi

if [ "x$Err" != "x" ] ; 
then
    read -p "press ^C to break now, error was:  $Err  "
fi
CNT=$(( $CNT1 + $CNT ))
echo $CNT
done

Not really clean, but works for me.

magnumripper commented 9 years ago

Why do we do lseek(end) on the .pot file, but NOT on the log file??? Is that simply for that 'pot reload' logic??

Yep, it's just for getting the position before/after writing. We don't have to seek for the write, the O_APPEND guarantees a seek to end when writing anyway. Except for vboxsf...

magnumripper commented 9 years ago

I am going to make a 'simple' script and start looking for this one also.

Why? It's just the exact same symptoms from the exact same vboxsf bug. IMHO time would be better spent trying to create a good isolated test case and report this to Oracle or whoever is the author of vbox.

magnumripper commented 9 years ago

We still do a SEEK_END on the .pot (but not .log). I am not 100% sure WHY we do that. I really think that should be a SEEK_CUR

Current what!? Who's current position? What we are doing here is 1. Remember the eof-position after WE wrote to the pot file, and 2. next time we write, first check if a SEEK_END now show a different figure. If it does, someone else wrote to the pot file so we need a pot sync.

and allow the append mode logic to work, BUT you mentioned that something about NFS having

The O_APPEND totally ignores where we seek. You can seek to pos 0 before writing and it will still write to the end of file. The seeks are solely for keeping track of what we did vs. what someone else possibly did.

issues. BUT if that is the case, then I would think we would ALWAYS SEEK_END, both for the .pot and .log file ??? Rereading that, it appears that possibly using APPEND open with flock 'may' allow the system to simulate proper appending (which was what you found out about corruption).

We are opening with O_APPEND. That's core code.

jfoug commented 9 years ago

Ok, I had to go out and left the above script on. It was at 3600 when I returned, and no problems.

Without the creation of the files (empty but there), prior to calling JtR, that script will die very quickly when run, from 1st run, to 10th run about 90% of the time, usually on 1st or 2nd run.

So I would say this is _the problem_ seen on -fork=x on vboxfs, and possibly on other file systems.

I am going to spend time on core. It may be that the TS does not smoke the .pot or .log file the same way for core. The code to handle that is very different than the code for jumbo. I will be testing core using a 'simple' script like we have on this page. I am still pretty concerned that we see this problem on jumbo but NOT on core. The code for logging should be somewhat similar. There is no 2nd opening of the log file with fopen/fdopen, no lseek(END), etc. It is pretty much the same crap. And I was seeing the .log file failing also, not just the .pot file.

magnumripper commented 9 years ago

If it does, someone else wrote to the pot file so we need a pot sync.

Oh, and another detail: If we need a pot sync, we are not always just doing that from the start of the file! We are doing it from the last known "synced" position. The code might look weird unless you know why.

magnumripper commented 9 years ago

Rereading that, it appears that possibly using APPEND open with flock 'may' allow the system to simulate proper appending (which was what you found out about corruption).

We are opening with O_APPEND. That's core code.

OK, now I re-read what you wrote :-P Yes, as I understand it, the O_APPEND alone is enough to be virtually 100% safe on a non-network, non virtual, file system. Or perhaps more generally speaking, one where we own the actual block device. So the flock() is a second-layer protection for network filesystems and other situations where O_APPEND is not 100% safe.

jfoug commented 9 years ago

We still do a SEEK_END on the .pot (but not .log). I am not 100% sure WHY we do that. I really think that should be a SEEK_CUR

Current what!? Who's current position? What we are doing here is 1. Remember the eof-position after WE wrote to the pot file, and 2. next time we write, first check if a SEEK_END now show a different figure. If it does, someone else wrote to the pot file so we need a pot sync.

I think you are wrong on this one. The file handle used for the read is not the same as for the write. The write is at the end, I am using SEEK_CUR for that, and I believe it is 100% happy (on vboxsf!)

magnumripper commented 9 years ago

I can't believe I'm wrong. If someone else writes to the file, my "current" will not change. See?

jfoug commented 9 years ago

Sorry, I posted the above while catching up, prior to reading your last comment.

I changed to a SEEK_CUR to get the current offset (keeps the same logic for when/how much to reload), and the code is working 100% on vboxfs (with the files there).

That is the KEY for vboxfs. I think I can try to test this on an NFS mount at work. It possibly is the same problem. That once the file is 'there' and we flock(), then the APPEND mode gets non-corrupted files.

jfoug commented 9 years ago

The current WILL change (in SEEK_CUR). Keep in mind, the APPEND mode. I am running that way right now, works fine. Try it out, you will (i think) be surprised ;) put the debugging on, so that you can follow the file offsets and lengths.

We are not using 'buffered' file stuff, where the CRTL will (or possibly) not hit the OS level. Here we are using 'raw' files. So we lock and get exclusive access, do a seek_cur, and the OS will seek to end of file, and give us the length. It sounds 'funny', but that is what is happening.

I have not tested this on cygwin yet. I will do a head slap if cygwin all of a sudden magically works, lol.

magnumripper commented 9 years ago

The current WILL change. Keep in mind, the APPEND mode. I am running that way right now, works fine.

Think again. We seek to end before writing.

What do you mean "works fine"? Does pot sync work fine? That is ALL this is about.

jfoug commented 9 years ago

No, I am saying that the return from lseek for SEEK_CUR and SEEK_END are the same. But under the hood, I am not sure they are the same. With the return being the same, the other logic (determining that some other process has dropped a hunk of .pot that we need to re-read) should be the same as it was before.

magnumripper commented 9 years ago

No, I am saying that the return from lseek for SEEK_CUR and SEEK_END are the same

Of course they are, until someone else writes to the file.

jfoug commented 9 years ago

Nope. Again, lseek(fp, SEEK_CUR) IF fp was opened in APPEND mode, does not give a shit about what the value was last time it checked. It will hit the OS and return the current end of file location. Now, if the file is exclusive locked, then the lseek(cur) will be the same location that we now write to. We then unlock. If 5 other processes add 500k of data to the file, and we later get exclusive to the file and lseek(cur), the value returned will still be the end of the file (500k bytes past the end of our last write).

jfoug commented 9 years ago

It probably does not matter either way. This made no change at all to either cygwin or to vboxsf systems (NFS is still to be seen). We can certainly keep the SEEK_END, but it does not matter at all.

magnumripper commented 9 years ago

lseek(fp, SEEK_CUR) IF fp was opened in APPEND mode, does not give a shit about what the value was last time it checked. It will hit the OS and return the current end of file location.

Some implementation may work like you say (but that would waste resources), but that's not the spec. I hope you are not using vboxsf as any kind of reference as we have established it's very b0rken.

magnumripper commented 9 years ago

lseek(fp, SEEK_CUR) IF fp was opened in APPEND mode, does not give a shit about what the value was last time it checked. It will hit the OS and return the current end of file location.

What you say is guaranteed AFTER writing and BEFORE releasing the lock. But not BEFORE writing, and that's where we do have the SEEK_END you questioned.

jfoug commented 9 years ago

I have only made a local change, and since it improved nothing, there is absolutely no reason to commit. I have been trying quite a bit to try to get cygwin working. vboxsf is not nearly as b0rked as cygwin. I 'can' get some runs to work out of it, but very few. Probably 75% of the time, a TS run will get part way through and deadlock.

magnumripper commented 9 years ago

I have a feeling you should open a separate issue for Cygwin and keep this one solely for vboxsf. We are confused enough without mixing them in one :-)

jfoug commented 9 years ago

Ok. I doubt cygwin will get fixed, but an issue certainly is not a bad thing. At least it gives me a place to dump WHAT I have seen and tried, even if I do not get it, someone else that wants it enough, may pick it up and continue.

magnumripper commented 9 years ago

With all we learnt here I think we could track down the actual problem in Cygwin. If nothing else, we should be able to describe the exact problem. It's just a matter of establishing exactly when things are starting to go bad.

magnumripper commented 9 years ago

Who knows, maybe the new fcntl() locking fixes your problem with vboxsf too.

magnumripper commented 9 years ago

I doubt cygwin will get fixed, but an issue certainly is not a bad thing.

:smile:

jfoug commented 9 years ago

I doubt cygwin will get fixed, but an issue certainly is not a bad thing.

:smile:

:astonished:

magnumripper commented 9 years ago

I doubt cygwin will get fixed, but an issue certainly is not a bad thing.

:smile:

:astonished:

Quoting RFC 1925: With sufficient thrust, pigs fly just fine.

jfoug commented 9 years ago


On vboxfs with latest code:

$ ./flock-tst fail
this run is expected to TO FAIL on systems that have flock problems on non-existant files.
1
2
...
52
53
press ^C to break now, error was:  failed pot 1331

Still failing. But I just checked build-info, and yes locking is fcntl(), BUT I also am built with memdbg=on and asan on. I will remove those and recheck.

Nope, fails just as bad, with or without memdbg and ASAN.

jfoug commented 9 years ago

Quoting RFC 1925: With sufficient thrust, pigs fly just fine.

But when quoting that truth, only telling a 1/3 truth is unwise:

(3) With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.

jfoug commented 9 years ago

I am just glad cygwin is working. I still am getting a machine lock up, but i think it is in the same format. I will lock the machine up again and make sure that the TS window is on top, so I can watch it ;)

I think it is one of the formats that has a flock in it (dmg_format ?) Nope, not that one. Well, I will run and make sure I write down where it crashed (after saving my desktop for quick restart, lol).

magnumripper commented 9 years ago

dmg format has locking in some debug code iirc, not otherwise.

jfoug commented 9 years ago

Are we happy with this issue?

The original problem I made this issue for still stands, but now we KNOW the work around. That is the corrupted files on vboxfs. That is Oracle's problem. If we run with the .pot file (and .log file) being on disk, then vboxfs works just fine. So that being the case, we really can not do much to fix John to fix this problem, other than creating and closing a zero byte .pot/.log file when john starts, IF they do not exist prior to the forking code launching children.

magnumripper commented 9 years ago

Yes. It should be reported to Oracle and I think we can close this. You could add some text in doc/BUGS if you like, describing the known issue and workaround.

magnumripper commented 9 years ago

On a side note I did heavy testing of these things all day today and I did find some interesting (frightening) facts. For example, a bog standard NFS share with Linux server and Linux clients: Client could lock, other clients would see the lock. But the server's native filesystem (at "same" directory) would NOT see clients' locks nor share its locks with the clients. This means you can run over NFS (required for MPI) just fine but you must NOT run any JtR process on the NFS server itself.

Just in case some random googler find this: The above was 100% fixed when we switched to fcntl() locks instead of flock().

jfoug commented 9 years ago

That is really good to hear that fcntl() works better. I was hoping that was the case, once I found the older documentation listing that it handled NFS correctly. I think the older BSD flock interface simply has not been given the same usage as fcntl, and thus bugs are not fixed as frequently.

Looking back, this was a pretty long thread over a bug. One of our longer ones. Both of us had to unlearn a few things we thought were true and also learn a few things we did not know about.

magnumripper commented 9 years ago

Learning is easy, unlearning is the tricky shit! I was almost sure there would be no difference because I thought flock() was using fcntl() for NFS (at least). But this is now unlearned. I'm glad I tested it.

BTW I wrote a clone of flock(1) that was using fcntl instead just for testing. Maybe I should publish it somewhere.

openwall / john

-fork may be borking .pot file (and other FILE io) on jumbo. #1043