Closed beemibrahim closed 2 years ago
It seems to continously call ctre::evalute_recursive ???
That's how system stack based regex engines are supposed to work. They are using your system stack to store all possible paths thru your subject.
There are a few problems with your regex:
1) if you have anchors on both sides you shouldn't be using ctre::search
but ctre::match
as that's what is matching whole input (equivalent of search with ^...$
)
2) what's the point for \s*
at the end?
2) your pattern should be changed to stop on doublequotes \"([^\"]*)\"
otherwise it will skip every doublequotes until the end, and then it fails to match it because of missing doublequotes, and then it tries to add one character to \s*
Oh I thought they were the ctre::search
and ctre::match
were the same : )
I didn't write that regular expression
Wait it fails with ctre::match too
In perl regex and in std::regex it works , but doesnt work with ctre , it should at least throw an exception that can be caught
CTRE is not designed to save you from bad regular expressions, I consider them a program. And in some cases especially greedy expressions can lead to such issue. Consider modifying them into possessive / lazy expressions.
a+
... greedy
a+?
... lazy
a++
... possessive
CTRE is not designed to save you from bad regular expressions, I consider them a program. And in some cases especially greedy expressions can lead to such issue. Consider modifying them into possessive / lazy expressions.
a+
... greedya+?
... lazya++
... possessive
it that because it is done in compile time ??
Because using non-system stack would mean it needs to allocate for each evaluation, which would be expensive. Problem of your expression is it's exploding in complexity O(n^2) and for each path take it's storing its matches. mostly because .*
in your expression takes everything including double-quotes inside of other double-quotes. And because it's greedy matching algorithm, it tries every possible match starting with longest. So it needs to store possible way for each character between start of the cycle and end of the subject.
Easiest way how to avoid this complexity (don't get me wrong, but this creates problem for other engines too) is to replace .*
inside quotes with lazy one .*?
Lazy tries all paths too, but starts with shortest, so it needs only to store last successful match.
The output is wrong , it returns the value as
TEXT TEXT TEXT TEXT TEXT-TEXT-THAT TEXT.TEXT TEXT TEXT + I I II * I II + I NVM_>U=SOMEMORETEXT#&PhK&2.
When it should return it as
TEXT TEXT TEXT TEXT TEXT-TEXT-THAT TEXT.TEXT TEXT TEXT + I I II * I II + I NVM_>U=SOMEMORETEXT#&PhK&2 ""TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT""
Because using non-system stack would mean it needs to allocate for each evaluation, which would be expensive. Problem of your expression is it's exploding in complexity O(n^2) and for each path take it's storing its matches. mostly because
.*
in your expression takes everything including double-quotes inside of other double-quotes. And because it's greedy matching algorithm, it tries every possible match starting with longest. So it needs to store possible way for each character between start of the cycle and end of the subject.Easiest way how to avoid this complexity (don't get me wrong, but this creates problem for other engines too) is to replace
.*
inside quotes with lazy one.*?
Lazy tries all paths too, but starts with shortest, so it needs only to store last successful match.
Okay I'll try this
The output is wrong , it returns the value as
TEXT TEXT TEXT TEXT TEXT-TEXT-THAT TEXT.TEXT TEXT TEXT + I I II * I II + I NVM_>U=SOMEMORETEXT#&PhK&2.
When it should return it as
TEXT TEXT TEXT TEXT TEXT-TEXT-THAT TEXT.TEXT TEXT TEXT + I I II * I II + I NVM_>U=SOMEMORETEXT#&PhK&2 ""TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT""
Where to you expect it to stop? .*
will go thru whole input, it doesn't count pairs of double-quotes. What input would you expect for:
key="""""""""""""""""""
?
Because using non-system stack would mean it needs to allocate for each evaluation, which would be expensive. Problem of your expression is it's exploding in complexity O(n^2) and for each path take it's storing its matches. mostly because
.*
in your expression takes everything including double-quotes inside of other double-quotes. And because it's greedy matching algorithm, it tries every possible match starting with longest. So it needs to store possible way for each character between start of the cycle and end of the subject. Easiest way how to avoid this complexity (don't get me wrong, but this creates problem for other engines too) is to replace.*
inside quotes with lazy one.*?
Lazy tries all paths too, but starts with shortest, so it needs only to store last successful match.Okay I'll try this
Hey this works , It returns the right output for me :)
The output is wrong , it returns the value as TEXT TEXT TEXT TEXT TEXT-TEXT-THAT TEXT.TEXT TEXT TEXT + I I II I II + I NVM_>U=SOMEMORETEXT#&PhK&2. When it should return it as TEXT TEXT TEXT TEXT TEXT-TEXT-THAT TEXT.TEXT TEXT TEXT + I I II I II + I NVM_>U=SOMEMORETEXT#&PhK&2 ""TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT""
Where to you expect it to stop?
.*
will go thru whole input, it doesn't count pairs of double-quotes. What output would you expect for:
key="""""""""""""""""""
?
Nevermind
I guess I'm gonna have to be more careful with regex while using ctre :|
This applies to every back-tracking regex engine. Only engines immune against this are non-backtracking, and they are not able to provide other features.
This Code Fails miserably :