Open mmarton opened 2 years ago
I think the new change is correct as it's more accurately counting the conditions in the IF that would evaluate it to TRUE or FALSE. Keen to hear what @MarkBaker thinks.
There are a number of ways of writing this code:
$variable1ToTest = $variable1ThatCanBeNullOrFloat ?? 0.0;
if (($variable1ToTest !== 0.0) {
$context->addViolation('description');
}
would be another way of expressing the same code logic using an intermediate assignment; and which clearly should increase the complexity by two - once for the null coalescence
expression, and once for the if
statement. Internally, PHP is doing just this, though the intermediate variable is kept hidden. Combining the two into a single line like if (($variable1ThatCanBeNullOrFloat ?? 0.0) !== 0.0) {...}
doesn't reduce that complexity, it just "hides" it: the null coalescence
is still evaluated with two possible paths, and the if
condition is still tested with its two paths.
I believe that the complexity is still reflected accurately here - as least as accurately as Cyclomatic Complexity can be measured - in reflecting both the null coalescence
expression and the if
condition.
The CYC count isn't 100% accurate: coding structures have evolved since Henry McCabe's original 1976 paper on the subject.
There is some argument on whether "Decisions involving compound predicates like those found in high-level languages like IF cond1 AND cond2 THEN ...
should be counted in terms of predicate variables involved, i.e. in this example one should count two decision points, because at machine level it is equivalent to IF cond1 THEN IF cond2 THEN ....
" (quoting from Wikipedia). This would be the case for your if ($variable1ThatCanBeNullOrFloat !== null && $variable1ThatCanBeNullOrFloat !== 0.0)
alternative... however, this takes no account of lazy evaluation for ||
within an if
evaluation... it's probably more pertinent to measuring NPath Complexity than the more simplistic Cyclomatic Complexity.
Nor does the CYC calculation make any allowance for early returns (McCabe's definition of complexity is based one one entry point, one exit point within the code), or jumps (goto
). And if I had my way nested ternaries should increment the complexity logarithmically, rather than simply by one for each level of nesting.
At the end of the day, Cyclomatic Complexity is a guideline that we can use to assess the maintenance risks of our code, not a hard and fast rule. Even the warning and error levels that we apply are just guidelines based on general principles of what is considered complex.
If it's any consolation; the most complex method in my most popular OSS library has a CYC of over 21 million. I've tried refactoring it, but any refactoring has a significant adverse affect on performance. That CYC makes it a pain to maintain; but a 3x performance overhead would make it impossible to run. I choose to live with that maintenance risk for the benefit of performance.
Okay, i can accept that argument, but still weird that some operators count and some doesn't. Anyway, since you changed the calculation making it harder to stay inside the recommendations, wouldn't it make sense to also increase the warning/error limit?
And another theoretical question:
Shouldn't optional chaining counted too?
return $object->getX()?->getY()
is technically return $object->getX() !== null ? $object->getX()->getY() : null
Anyway, since you changed the calculation making it harder to stay inside the recommendations, wouldn't it make sense to also increase the warning/error limit?
This is probably a call for Greg to make.
In McCabe's original paper, he refers to as an upper threshold of 10 as a “reasonable, but not magical, upper limit”. But a subsequent presentation that he gave on the topic ('Software Quality Metrics to Identify Risk') and published as a NIST advisory used the following table to assess risk based on complexity:
CYC | Risk |
---|---|
1-10 | A simple program without much risk |
11-20 | More complex, moderate risk |
21-50 | Complex, high risk program |
51+ | Untestable program, very high risk |
Steve McConnell in his book "Code Complete" was stricter:
CYC | Risk |
---|---|
1-5 | The routine is probably fine |
6-10 | Start to think about was to simplify the routine |
11+ | Break part of the routine into a second routine, and call it from the first routine |
More recently, Phil Koopman seems to suggest thresholds of 10-15, and 30-50 (so potentially 15 and 30) although he's thinking in terms of embedded systems
Personally, I'd have no problem with setting the thresholds at 15 and 30 rather than 10 and 20; to make allowance for more modern coding structures and expressions.
Shouldn't optional chaining counted too?
return $object->getX()?->getY()
is technicallyreturn $object->getX() !== null ? $object->getX()->getY() : null
It probably should, because it's an early break from the full expression, one that I'd forgotten while I was looking at what was missing from the sniff.
There's other omissions too:
An expression like;
$file = fopen($filename, 'r')
or exit("unable to open file ($filename)");
or the use of break
or continue
while in a loop.
The CYC check is fairly simplistic in that it simply looks for certain tokens inside a function/method, and increments the CYC value by 1 each time it finds one of those tokens.
The two cases that I mentioned above couldn't be counted in the same way because they're dependent on the context, not simply the presence of the tokens in the code. break
can also occur in the cases of a switch
statement, and shouldn't be counted when it's in that context, but only when the context is exiting from a loop. The or
token can also occur when combining conditions in an if
statement, so it shouldn't increment the CYC value in that context.
Rewriting the sniff itself to take that context check into account would make it a lot more complicated.
Other possible omissions are subject to debate. In a switch
or match
statement, some people would argue that each case
increments CYC by one, regardless of the number of conditions being tested in that case; others argue that a case
with multiple conditions should increment for each condition. e.g.
return match(strtolower(substr($monthName, 0, 3))){
'apr', 'jun', 'sep', 'nov' => 30,
'jan', 'mar', 'may', 'jul', 'aug', 'oct', 'dec' => 31,
'feb' => is_leap_year($year) ? 29 : 28,
default => throw new InvalidArgumentException("Invalid month"),
}
should increment CYC by 13 rather than by 4 (+1 for the leap year ternary)
A throw
is another case that sees some argument about whether it should increment CYC as well.
Getting back to your actual question though, I'd say that yes, the nullsafe operator (?->
) should increase CYC by 1. I'll look to submit a PR for that.
brake and continue in a loop are (almost) always inside of an if statement, so that would be double counting imo
Normally; but you could get very dirty code like
$data = [
'Hello',
'World',
'Terminate',
'Goodbye',
];
foreach ($data as $entry) {
switch ($entry) {
case 'Terminate': break 2;
default: echo $entry, PHP_EOL;
}
}
Yes, and I can use goto and eval() too :) But I don't think there is an common part of the ones who are doing that and who are caring about code quality :D
@gsherwood Any opinion on changing the Warning and Error thresholds for CYC?
@MarkBaker The thresholds can be customized per project via the project custom ruleset: https://github.com/squizlabs/PHP_CodeSniffer/wiki/Customisable-Sniff-Properties#genericmetricscyclomaticcomplexity
@MarkBaker The thresholds can be customized per project via the project custom ruleset: https://github.com/squizlabs/PHP_CodeSniffer/wiki/Customisable-Sniff-Properties#genericmetricscyclomaticcomplexity
yes, I can, and I'm doing it right now in my projects. But the question here is should the recommended default values go up with the recent changes in the calculation. And I think they should.
Describe the bug
This might not be a bug, I'm just curious of your opinion
After https://github.com/squizlabs/PHP_CodeSniffer/issues/3469 multiple part of my code is reporting too much cyclomatic complexity. i have something similar:
Until now one of this statements was +1 to complexity, not it's +2 because the null coalescing. But it's still 2 options here. stepping into the if or stepping it over.
Yes I can rewrite it to
if ($variable1ThatCanBeNullOrFloat !== null && $variable1ThatCanBeNullOrFloat !== 0.0)
but it's a lot more character and I like to spare my keyboard.what do you think?