mbj / mutant

Automated code reviews via mutation testing - semantic code coverage.
Other
1.96k stars 153 forks source link

Add mutation `/.+?/` → `/.+/` #597

Open tjchambers opened 8 years ago

tjchambers commented 8 years ago

This one could be debateable. I don't "think" I saw it in any of the suggested prior enhancements.

dkubb commented 8 years ago

I think this is a reasonable mutation. You should only use .?+ if there's a case where you need it to be non-greedy. You should have to provide a test case to validate it; so I agree with this.

tjchambers commented 8 years ago

Thanks for the vote of confidence. I have a lot of these where I "know" I need to be non-greedy. So this will ensure I have the proper test cases.

mbj commented 8 years ago

Is greedyness more primitive than non greedyness?

tjchambers commented 8 years ago

@mbj Is that a philosophical question?

I need to test to avoid greedyness - I have multiple pairs of "bookends" in a string. SO this mutation would prove I have adequately tested.

mbj commented 8 years ago

@tjchambers No not a philosophical question. An axium applyment question. to me "gredyness is more powerful than non greedyness".

dkubb commented 8 years ago

Is greedyness more primitive than non greedyness?

I think so. I does require more syntax to enable non-greedy matching. I feel like if you're writing a regexp, and you're using something like .+ you need to make the choice in that applies limits to it's default behaviour.

I would consider unnecessary usage of .+? in a regexp to be a poorly written regexp.

tjchambers commented 8 years ago

@mbj actually I think this may work potentially as a mutation both ways. I may have some places where greedyness is warranted, and I have used non-greedy test data without considering it, but in that case they behavior is the same. It is only when converting from non-greedy to greedy that I would expect a failure. So it may be a useful bidrectional mutation?

I need to think more on that.

mbj commented 8 years ago

I think so. I does require more syntax to enable non-greedy matching.

I think we should focus on the semantics, not the syntax to apply the axiom?

dkubb commented 8 years ago

I think so. I does require more syntax to enable non-greedy matching.

I think we should focus on the semantics, not the syntax to apply the axiom?

@mbj Interesting. I'm thinking more about this and maybe you could be right. I found a few articles to support the axiom:

In general I never liked non-greedy matching because I also dislike greedy matches using .+ and .*, for the same reason: lack of precision. Other than in a one-off script, it's fairly rare that you (I) ever want to match (literally) every single possible character in production code. When I write those I might use them as a placeholder while I flush out some other part of the regexp, but leaving them in the code permanently feels so sloppy to me.. it's like I'm writing "don't care" in big bold letters in the code.

I guess if you need to use them for whatever reason, and you need greedy matching, you probably should have a test case to validate them otherwise a non-greedy match is usually going to be more efficient; like how atomic grouping is too.

dkubb commented 8 years ago

Thinking about this even further, if we consider a regexp to be a representation of a (possibly infinite) set of all possible strings that a regexp could match, then it becomes even more clear what kinds of regexp mutations we should prefer:

(I thought of maybe a possibly a third rule which I can't explain yet but have a somewhat fuzzy understanding as still being infinite but there being less variance; maybe the set would be more easily described since it would have fewer infinite parts, or parts which repeat in a more regular pattern. Maybe there's even a fourth rule which allows for something less infinite, which I'd guess doesn't have a mathematical definition but I hope we all have an intuitionistic sense of what I mean by this.)

In addition to these I'm sure we could figure out more rules, but I think this is a decent start and matches the rules we apply to code in general.

Applying these rules to this specific problem then it become fairly clear we should prefer non-greedy matches over greedy matches.

backus commented 8 years ago

Are we talking generically about greedy vs. non-greedy matching here or specifically with respect to the use of the dot? The two regular expressions in question parse like so:

2.3.0 :002 > Mutant::AST::Regexp.to_ast(Mutant::AST::Regexp.parse(/.+/))
 => s(:regexp_root_expression,
  s(:regexp_greedy_one_or_more, 1, -1,
    s(:regexp_dot_meta)))

2.3.0 :003 > Mutant::AST::Regexp.to_ast(Mutant::AST::Regexp.parse(/.+?/))
 => s(:regexp_root_expression,
  s(:regexp_reluctant_one_or_more, 1, -1,
    s(:regexp_dot_meta)))

So really we are talking about regexp_greedy_one_or_more and regexp_reluctant_one_or_more

mbj commented 8 years ago

@dkubb OT: IMO focusing on minimizing semantics (not terminal syntax) is also what we do successfully to determine the availability of ruby mutations. There are some mutations that encode less semantics in "more" characters of code.