janstarke / rexgen

API Documentation
https://github.com/janstarke/rexgen/blob/master/doc/api.md
GNU General Public License v2.0
52 stars 20 forks source link

unexpected rexgen output, segfaults and other errors #65

Closed frank-dittrich closed 2 years ago

frank-dittrich commented 4 years ago

This is from man rexgen:

xamples
       rexgen index.php?id=[1-5]
              Would create the results
              index.php?id=1
              index.php?id=2
              index.php?id=3
              index.php?id=4
              index.php?id=5

Instead, I get these 10 lines of output:

$ rexgen index.php?id=[1-5] 
indexphid=1
indexphpid=1
indexphid=2
indexphpid=2
indexphid=3
indexphpid=3
indexphid=4
indexphpid=4
indexphid=5
indexphpid=5

This simple example works as expected:

$ rexgen indexphpid=[1-5]
indexphpid=1
indexphpid=2
indexphpid=3
indexphpid=4
indexphpid=5

The . is somehow simply dropped from the output:

$ rexgen index.phpid=[1-5]
indexphpid=1
indexphpid=2
indexphpid=3
indexphpid=4
indexphpid=5

Next, example with ?

$ rexgen indexphp?id=[1-5]
indexphid=1
indexphpid=1
indexphid=2
indexphpid=2
indexphid=3
indexphpid=3
indexphid=4
indexphpid=4
indexphid=5
indexphpid=5

May be, this is the expected output, and the man page example needs to be corrected.

This produces the output mentioned on the man page:

 $ rexgen 'index\.php\?id=[1-5]'
index.php?id=1
index.php?id=2
index.php?id=3
index.php?id=4
index.php?id=5

While experimenting, I managed to produce a segfault

$ rexgen index[.]phpid=[1-5]
Segmentation fault (core dumped)

and another unexpected error:

$ rexgen index[.?]phpid=[1-5]
syntax error, unexpected T_OPTIONAL_QUANTIFIER
Syntax Error:
(null)
$ echo $?
1
frank-dittrich commented 4 years ago

Another segfault:

$ rexgen 'test\1[12]'
Segmentation fault (core dumped)

The \1 in this case is an invalid back reference, but this shouldn't result in a segfault.

Another syntax error:

$ rexgen 'test\[12]'
syntax error, unexpected T_END_CLASS, expecting $end
Syntax Error:
(null)

In this case, I can see that the error is caused by the opening [ being escaped, while the ] isn't. But it would be nice to provide some kind of a more user-friendly description of possible error codes and their meaning

To avoid the unexpected T_END_CLASS syntax error, I would have to escape the \ as well:

$ rexgen 'test\[12\]'
test[12]

To avoid the unexpected T_OPTIONAL_QUANTIFIER, I need to escape '.' and '?' even inside [...]:

rexgen 'index[\.\?]phpid=[1-5]'
index.phpid=1
index?phpid=1
index.phpid=2
index?phpid=2
index.phpid=3
index?phpid=3
index.phpid=4
index?phpid=4
index.phpid=5
index?phpid=5
frank-dittrich commented 4 years ago

One more segfault:

$ rexgen 'test[~- ]'
Segmentation fault (core dumped)

And one more syntax error:

$ rexgen 'test[~-]'
syntax error, unexpected T_END_CLASS, expecting T_ANY_CHAR
Syntax Error:
(null)
frank-dittrich commented 3 years ago

Another segfault for

$ rexgen 'a[]'

can apparently be avoided like this:

diff --git a/src/librexgen/iterator/classregexiterator.cpp b/src/librexgen/iterator/classregexiterator.cpp
index d12573a..c29ab10 100644
--- a/src/librexgen/iterator/classregexiterator.cpp
+++ b/src/librexgen/iterator/classregexiterator.cpp
@@ -46,7 +46,7 @@ void ClassRegexIterator::value(SimpleString* dst) const {
    * FIXME(jasa):
    * this condition may be expensive and should be unnecessary
    */
-  if (current >= 0) {
+  if (current > 0) {
     const std::string::size_type &length = lengths[current];
     const std::string::size_type &index = indices[current];

diff --git a/src/librexgen/iterator/classregexiterator.h b/src/librexgen/iterator/classregexiterator.h
index 5fa8d13..7aa483a 100644
--- a/src/librexgen/iterator/classregexiterator.h
+++ b/src/librexgen/iterator/classregexiterator.h
@@ -66,7 +66,7 @@ namespace rexgen {
         * FIXME(jasa):
         * this condition may be expensive and should be unnecessary
         */
-      if (current >= 0) {
+      if (current > 0) {
         const std::string::size_type &length = lengths[current];
         const std::string::size_type &index = indices[current];
$ rexgen 'a[]'
a

But a real fix would be to reject [], because my clumsy change breaks output for

$  rexgen '(ab[cde])\1'

which now (with my change) no longer produces

abcabc
abdabd
abeabe

but instead

abab
abdabd
abeabe
frank-dittrich commented 3 years ago

One more segfault:

$ rexgen [äöü]'
frank-dittrich commented 3 years ago

One more segfault:

$ rexgen '[äöü]'
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::at: __n (which is 6) >= this->size() (which is 6)
Abgebrochen (Speicherabzug geschrieben)
janstarke commented 3 years ago
xamples
       rexgen index.php?id=[1-5]
              Would create the results
              index.php?id=1
              index.php?id=2
              index.php?id=3
              index.php?id=4
              index.php?id=5

is a bug in the man page. I try to be as PCRE-conform as possible, so masking . and ? is necessary.

However, the segfaults are bugs in the software which I need to fix. Thanks for documenting this

janstarke commented 3 years ago

I don't forbid empty classes (such as []), because there may be empty classes which cannot be detected easily, such as [9-8] or [~- ] ;-) So, I added a guard which handles empty classes in a special way.

$ rexgen 'test[~-]'
syntax error, unexpected T_END_CLASS, expecting T_ANY_CHAR
Syntax Error:
(null)

This is expected behaviour, because a hyphen in a character class denotes a range, which must have a first and a last character. As hyphen as first character loses its special meaning:

$ rexgen 'test[-~]'
test-
test~

But I confirm that the error messages could be better....

janstarke commented 3 years ago

looks fixed for me. could you please take a look?