janstarke / rexgen

API Documentation
https://github.com/janstarke/rexgen/blob/master/doc/api.md
GNU General Public License v2.0
52 stars 21 forks source link

topiterator->next() problems containing streams #28

Closed jfoug closed 7 years ago

jfoug commented 8 years ago

In a stream expression, \0 if the callback returns a zero length string, the iterator is left in a 'bad' state, if it is attempted to be re-used.

I have been working on getting the save/restore code working within JtR. We generate words, and then if there is a rexgen using it (with a /0 ), we were firing up a session with a new iterator for each word. That worked fine, but when we tried to restore, rexgen would crash. Well after digging a bit, I see that rexgen is using the iterator id as a key on restore, so this re-creation of new iterators was stopped, and a long running iterator was then created.

However, for this, we have a candidate value, and use a callback to give it over to rexgen. So rexgen iterates the entire set of data, calls the callback, and gets a null string. We then compute the next base-word, and do it over again.

However, this was working very poorly. If the iteration was 100 items, what would happen, is the first 100 would be right. Then on the 2nd word, there would be 100 items (99?) with a blank word, then 100 items with the 2nd word. Then the 3rd word would be 100 items with a blank word then 100 items with the 3rd word.

When I tracked things down is became apparent that the callback was not being called again. When the callback first returned a null string, the iterator had reset itself to 'start', and would not call it again until it ran through the full set of next() calls.

So I dug in, and found a very easy correction for this. It is all within topIterator->next(). Here is teh patch. What this does is to track a boolean flag (only for a stream iterator). This flag will make sure that child->next() is not called until after a successful return from state->forceNext(); The return will be correct even if itr->next() is called many times. It simply continues to call forceNext() until that returns true. Then and only then on the next call to next() will child->next() be called.

This fully corrects the issue for john where we needed to use the same iterator over and over again feeding in different word(s). It should not have any invalid side effects for any other app, since it has the same behavior if an app simply opens up a file, and allows rexgen to read it for data (without wanting to recycle the iterator).

NOTE, I am still digging into what appears to be another 'bug' in the save/restore state, that I have not found a solution for yet.

$ git diff iterator/topiterator.h
diff --git a/src/librexgen/iterator/topiterator.h b/src/librexgen/iterator/topiterator.h
index 6a32134..931adbe 100644
--- a/src/librexgen/iterator/topiterator.h
+++ b/src/librexgen/iterator/topiterator.h
@@ -29,6 +29,7 @@
 class TopIterator : public Iterator {
  public:
   TopIterator(Regex* re) : Iterator(re->getId()) {
+       needWord = false;
        state = new IteratorState();
                child = re->iterator(state);
        }
@@ -45,11 +46,21 @@ class TopIterator : public Iterator {
   }

   bool next() {
+    if (needWord && state->getStreamIterator() != NULL) {
+      bool res = state->getStreamIterator()->forceNext();
+      if (res)
+        needWord = false;
+      return res;
+    }
+    needWord = false;
     bool res = child->next();
     if (res) { return res; }

     if (state->getStreamIterator() == NULL) { return false; }
-    return state->getStreamIterator()->forceNext();
+    res = state->getStreamIterator()->forceNext();
+    if (res) { return res; }
+    needWord = true;
+    return false;
   }

   void value(SimpleString& dst) const { child->value(dst); }
@@ -77,6 +88,7 @@ class TopIterator : public Iterator {
  private:
   Iterator* child;
   IteratorState* state;
+  bool needWord;
 };

 #endif // TOPITERATOR_H

NOTE, this could be changed from the above patch (removing superfluous code)

-    if (needWord && state->getStreamIterator() != NULL) {
+    if (needWord) {
....
     }
-    needWord = false;
     bool res = child->next();
     if (res) { return res; }
jfoug commented 8 years ago

NOTE, this (as the code exists today), is a show stopper for getting rexgen save/resume support added into JtR.

jfoug commented 8 years ago

I have created a pull request with the above change (and version change to 1.3.2) NOTE, the version would be important to JtR, even though there is absolutely no interface or save/resume interface changes. JtR can not run with the new changes on a version before 1.3.2

jfoug commented 8 years ago

Corrected here: https://github.com/jfoug/rexgen/commit/f5596214ac0780da1c8c46c87888ddf8b8f6d79e

Placed in pull-request #29