Closed frank-dittrich closed 2 years ago
Another segfault:
$ rexgen 'test\1[12]'
Segmentation fault (core dumped)
The \1
in this case is an invalid back reference, but this shouldn't result in a segfault.
Another syntax error:
$ rexgen 'test\[12]'
syntax error, unexpected T_END_CLASS, expecting $end
Syntax Error:
(null)
In this case, I can see that the error is caused by the opening [
being escaped, while the ]
isn't.
But it would be nice to provide some kind of a more user-friendly description of possible error codes and their meaning
To avoid the unexpected T_END_CLASS
syntax error, I would have to escape the \
as well:
$ rexgen 'test\[12\]'
test[12]
To avoid the unexpected T_OPTIONAL_QUANTIFIER
, I need to escape '.' and '?' even inside [...]
:
rexgen 'index[\.\?]phpid=[1-5]'
index.phpid=1
index?phpid=1
index.phpid=2
index?phpid=2
index.phpid=3
index?phpid=3
index.phpid=4
index?phpid=4
index.phpid=5
index?phpid=5
One more segfault:
$ rexgen 'test[~- ]'
Segmentation fault (core dumped)
And one more syntax error:
$ rexgen 'test[~-]'
syntax error, unexpected T_END_CLASS, expecting T_ANY_CHAR
Syntax Error:
(null)
Another segfault for
$ rexgen 'a[]'
can apparently be avoided like this:
diff --git a/src/librexgen/iterator/classregexiterator.cpp b/src/librexgen/iterator/classregexiterator.cpp
index d12573a..c29ab10 100644
--- a/src/librexgen/iterator/classregexiterator.cpp
+++ b/src/librexgen/iterator/classregexiterator.cpp
@@ -46,7 +46,7 @@ void ClassRegexIterator::value(SimpleString* dst) const {
* FIXME(jasa):
* this condition may be expensive and should be unnecessary
*/
- if (current >= 0) {
+ if (current > 0) {
const std::string::size_type &length = lengths[current];
const std::string::size_type &index = indices[current];
diff --git a/src/librexgen/iterator/classregexiterator.h b/src/librexgen/iterator/classregexiterator.h
index 5fa8d13..7aa483a 100644
--- a/src/librexgen/iterator/classregexiterator.h
+++ b/src/librexgen/iterator/classregexiterator.h
@@ -66,7 +66,7 @@ namespace rexgen {
* FIXME(jasa):
* this condition may be expensive and should be unnecessary
*/
- if (current >= 0) {
+ if (current > 0) {
const std::string::size_type &length = lengths[current];
const std::string::size_type &index = indices[current];
$ rexgen 'a[]'
a
But a real fix would be to reject []
, because my clumsy change breaks output for
$ rexgen '(ab[cde])\1'
which now (with my change) no longer produces
abcabc
abdabd
abeabe
but instead
abab
abdabd
abeabe
One more segfault:
$ rexgen [äöü]'
One more segfault:
$ rexgen '[äöü]'
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::at: __n (which is 6) >= this->size() (which is 6)
Abgebrochen (Speicherabzug geschrieben)
xamples
rexgen index.php?id=[1-5]
Would create the results
index.php?id=1
index.php?id=2
index.php?id=3
index.php?id=4
index.php?id=5
is a bug in the man page. I try to be as PCRE-conform as possible, so masking .
and ?
is necessary.
However, the segfaults are bugs in the software which I need to fix. Thanks for documenting this
I don't forbid empty classes (such as []
), because there may be empty classes which cannot be detected easily, such as [9-8]
or [~- ]
;-) So, I added a guard which handles empty classes in a special way.
$ rexgen 'test[~-]'
syntax error, unexpected T_END_CLASS, expecting T_ANY_CHAR
Syntax Error:
(null)
This is expected behaviour, because a hyphen in a character class denotes a range, which must have a first and a last character. As hyphen as first character loses its special meaning:
$ rexgen 'test[-~]'
test-
test~
But I confirm that the error messages could be better....
looks fixed for me. could you please take a look?
This is from
man rexgen
:Instead, I get these 10 lines of output:
This simple example works as expected:
The
.
is somehow simply dropped from the output:Next, example with
?
May be, this is the expected output, and the man page example needs to be corrected.
This produces the output mentioned on the man page:
While experimenting, I managed to produce a segfault
and another unexpected error: