isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
Other
42.46k stars 5.43k forks source link

ES.45 - Magic numbers in math operations #1861

Open carlosgalvezp opened 2 years ago

carlosgalvezp commented 2 years ago

Hi!

I have a question about ES.45. The way it's written, it bans any use of magic numbers (except a positive list of exceptions for the most common numbers, like 0, 1, etc). While I think the rule makes complete sense and I fully agree with the provided example, I believe it's detrimental when writing code that implements math, for example implementing a scientific paper in code:

// From paper X, Equation 2)
double x = 0.123 * y + 0.456 * z + 0.5 * w;

Versus:

// From paper X, Equation 2)
double x = kConstant0123 * y + kConstant0456 * z + kHalf * w;

Readability is greatly reduced in this case, plus makes reviewing the code much harder. The code is easier to review if it visually looks as similar as possible to the original source in the paper.

A solution is to add these magic constants to a list of "accepted" magic constants, but it feels that list can easily grow and defeat the purpose.

What do you think? Would it make sense to document this use case and offer some sensible advice?

Thanks!

peno2 commented 2 years ago

I think the rule is ok as it is. In your case, to not reduce readability, I suggest writing with short variable names, for example as follows:

// From paper X, Equation 2)
constexpr auto k1{0.123}; // Param xyz  in paper X, eq 1.1.
constexpr auto k2{0.456}; // Param xyzy in paper X, eq 1.2.
constexpr auto k3{0.5};   // Param qyzy in paper X, eq 1.3.

const double x{k1 * y + k2 * z + k3 * w};
MikeGitb commented 2 years ago

Can you give a few concrete examples of magic constants in scientific papers that don't have a name? Offhand I can only think of things related to cryptography or bit twiddling hacks. In physics, every (?) constant has a name, in math they don't use numbers except maybe pi and e ;) (of course I'm greatly simplifing and generalizing here).

carlosgalvezp commented 2 years ago

For example, let's say I want to implement constexpr approximate trigonometric functions based on Taylor series expansion:

https://en.wikipedia.org/wiki/Taylor_series

Take tanh for example, there's a term 17/315 there. As a reviewer of such code, it would be easier if the number was written directly in code instead of creating a constant for it. If I'm stuck in C++11 for reasons I can't implement the formula as a loop in a constexpr function.

Or perhaps let me call it an "engineering paper" instead of "science" paper :), where e.g. some float parameters are calculated empirically after doing some physical experimentation, which leads to a formula that uses those parameters and the paper doesn't call them "k1", "k2" and so on - since they don't know about magic numbers. I like the suggestion by @peno2 even though it does introduce an extra burden on code reviewers.

In general, the problem is naming constants with meaningless/redundant names just for the sake of it. Just like useless comments are regarded as code smell:

// Update the state - duh!
state.update();

so are useless constants (IMHO):

float getDegrees (float x ) 
{ 
    constexpr float k180  = 180.0F;
    return x * k180 / kPi; 
}
shaneasd commented 2 years ago

https://en.wikipedia.org/wiki/Luma_(video) gives the equation for luma from RGB as Y=0.2126 R + 0.7152G + 0.0722B This is one of many examples that calculate some sort of human perceived brightness from three color channels. You could write this as

double RedWeight = 0.02126; etc
Y = RedWeight * R + ...

but I don't think that gains you anything in terms of readability or maintainability.

If I want to calculate the volume or moment of inertia of a shape often I can look up an equation. e.g. the volume of a sphere is 4/3 pi r^3. I'm not sure what you would call that 4/3 constant (yes you'd need to do a floating point division) but I find it hard to imagine it being more readable as such.

MikeGitb commented 2 years ago

For example, let's say I want to implement constexpr approximate trigonometric functions based on Taylor series expansion

 constexpr double tanh_taylor_factors[]={....};

Or perhaps let me call it an "engineering paper" instead of "science" paper :), where e.g. some float parameters are calculated empirically after doing some physical experimentation

I have a lot of experience with that and we always put such a number into a constant. With appropriate explanation that describes exactly where that number comes from.

the equation for luma from RGB as [...]

Personally, I'd probably write that similar to

constexpr Vec3D luma_weights{0.0126, ...}; //source: ....
auto luma = luma_weights * rgb;

But I see your point.

e.g. the volume of a sphere is 4/3 pi r^3. I'm not sure what you would call that 4/3 constant

That's imho a good example.

MikeGitb commented 2 years ago

float getDegrees (float x ) { constexpr float k180 = 180.0F; return x * k180 / kPi; }

Also a good example. Albeit, if I were to use a constant it would be

constexpr double deg_per_rad = 180/pi;
carlosgalvezp commented 2 years ago

constexpr double tanh_taylor_factors[]={....};

Not so easy :)

All in all, this is compliant code:

// math.h
namespace detail
{
constexpr std::size_t kNrTaylorCoeffsTanh = 3;
constexpr std::array<double, kNrTaylorCoeffsTanh> kTaylorCoeffsTanh = {...};
constexpr std::size_t k0 = 0;
constexpr std::size_t k1 = 1;
constexpr std::size_t k2 = 2;
}  // namespace detail

inline constexpr double tanh(double x)
{
    return detail::kTaylorCoeffsTanh[detail::k0] * ... + 
              detail::kTaylorCoeffsTanh[detail::k1] * ... + 
              detail::kTaylorCoeffsTanh[detail::k2] * ...;
}

Sure, technically it's absolutely possible to comply with the rule. But I think at some point we need to take a step back and reflect about what we are trying to achieve instead of blindly following the rule. The rule is about making code readable. IMHO the above code is way less readable and harder to review and maintain than the original code with the magic numbers plugged into the formula.

Let alone the potential for re-defining detail::k1 and the like (re-used for similar constants in different TUs), violating the ODR. Constants without a real meaning that are given an arbitrary name pollute the namespaces and will eventually be duplicated.

BenBE commented 2 years ago

I think there are two main aspects with the magic numbers that should be kept in mind when applying the rules for avoiding them:

  1. DRY
  2. Ensure descriptiveness and documentation.

For calculating the volume of some shape like a sphere or other shape I'd likely would let the magic numbers slip, when that formula is properly documented with information on where those magic numbers arise from. This can either be an inline proof of the formula or some external link where such derivation and proof can be retrieved from.

Similarly for the Taylor series example I think it's quite valid to include those constants inline, IFF used exactly once AND there is documentation available nearby that explains where those numbers come from and how to get them. Splitting out those constants just to have them used once kinda defeats the purpose and overall reduces the legibility of the code; actually an aspect that when reviewing such a snippet of code would make me reject it in a review …

That aside I think in the case of the Taylor series it might be worth to implement not a static set of magic numbers, but instead implement their derivation for the particular series. Combining that approach with some simple templates could even be used to define the accuracy of the series' approximation.

Overall you likely should refrain from converting a magic number problem into a naming hazard …

MikeGitb commented 2 years ago

Are you deliberately trying to interpret the rules in the worst possible way and make fun of it?

Also, I was under the impression that the core guidelines aren't really concerned with outdated standards?

carlosgalvezp commented 2 years ago

I'm applying the rules just as they are written and implemented by clang-tidy, for a rather reasonable use case. Our codebase has many similar instances.

I want to highlight the issues that arise from applying this rule in the context of the whole CppCoreGuidelines instead of in isolation. I have great respect for them and enable all of them in clang-tidy - which sometimes can lead to complicated situations like this one.

You are right that the guidelines aren't limited to a given standard. In C++14 the constants can be put inside the function to reduce their scope. It still increases the line count significantly though.

carlosgalvezp commented 2 years ago

@MikeGitb Or are you suggesting that the following is allowed by ES.45?

std::array<double, 7> coefficients {...};
double x = coefficients[5];

Are those 2 magic numbers (7, 5) allowed?

interpret the rules in the worst possible way

Can the rule be interpreted in different ways, good and bad? IMO the rule is clear and has only 1 blanket interpretation, which is what is implemented in static analyzers (open-source and proprietary) and leads to the ugly code I posted above. It's not at all fun. That's why I'm bringing here real-world use cases that I think are worth considering and providing useful advice.

carlosgalvezp commented 2 years ago

One more thing that I learnt recently - the introduction of "common utility constants" to comply with this rule can lead to very subtle, undetected ODR violations (which is Undefined Behavior) in C++14 if not done carefully:

https://stackoverflow.com/questions/69105602/possible-odr-violations-when-using-a-constexpr-variable-in-the-definition-of-an

beinhaerter commented 2 years ago

Please note that the rule says "Avoid “magic constants”", not "Don't use "magic constants"". For example defining k0/k1/k2 for the indices 0/1/2 in your example looks like overkill. Maybe the enforcement of ES.45 is a bit too strict/too noisy.

Also the guidelines say "The rules are not precise to the point where a person (or machine) can follow them without thinking."

Your stackoverflow link is interesting. It might be worth opening an extra issue here for that problem.

carlosgalvezp commented 2 years ago

"The rules are not precise to the point where a person (or machine) can follow them without thinking."

Thanks, I missed that. I think this is one of those rules that are very hard to enforce automatically and rely heavily on code review (unlike others that are pretty straightforward). As a suggestion I'd add a paragraph acknowledging this together with examples of where this rule does not make sense (like array indices, as you point out). A similar example is engineering/scientific papers that refer to matrix elements e.g. M_31 and we refer to that in code as m(3, 1).

That's all I feedback I had, I appreciate the discussions! I personally can just silence warnings where it makes sense and move on. I'm more concerned about junior developers that might blindly follow the guidelines and clang-tidy warnings leading to actually worse code sometimes. Having such a reflection in the documentation can help them better understand the rule and make more informed decisions.

MikeGitb commented 2 years ago

I'm applying the rules just as they are written and implemented by clang-tidy, for a rather reasonable use case. Our codebase has many similar instances.

I want to highlight the issues that arise from applying this rule in the context of the whole CppCoreGuidelines instead of in isolation. I have great respect for them and enable all of them in clang-tidy - which sometimes can lead to complicated situations like this one.

It should be ovbious to everyone, that constexpr std::size_t kFive =5 has zero benefit. It neither conveys any additional semantics, nor can be changed later during refactoring (kFive = 4, hopefully wouldn't pass code review). So, if one eally needs x[kFive] and not x[column_index_of_velocity] then x[5] is obviously the better choice and if a tool warns about it, then that's a place where I'd put in a suppression (similarly to when the tool would warn me about the 4.0/3.0 when calculating the volume of a sphere).

Comming back to the tanh example, if you have 7 coefficients, what I would actually do is:

double tanh(double x){
    constexpr std::array<double,7> taylor_coeff ={1,2,3,4,5,6,7};
    double px=1;
    double acc=0;
    for(double c : coeff){
        acc += c*px;
        px  *= x;
    }
    return acc;
}

the broader point I'm trying to make here is that if it seems that I need multiple consecutive index constants I first check, if a loop wouldn't be more appropriate solution anyway (EDIT: Just to be clear: This isn't always the case).

On a sidenote: We do allow c-style arrays in our codebase, because historically std::array brought too many drawbacks (partial lack of constexpr, no autodeduction of the size, compiletime and debug overhead) for not enough benefit. What I'm advocating for when adopting guidelines is to follow them as strictly as possible, but also be aware that they can't possible cover every possible scenario, so there need to be escape hatches in place (i.e. don't let the letter of the guideline overrule common sense,).

MikeGitb commented 2 years ago

Can the rule be interpreted in different ways, good and bad?

This might be a bit off topic and I guess the major confusion has been clarified by @beinhaerter already, but: The CppGuidelines don't have a formal framework (not even in the sense that misra-c or autosar have), tend to give rather general advice and are written by fallable humans in their spare time. As a result, yes a lot of the rules/guidelines leave some/a lot room for interpretation, sometimes contadict each other or even have plain bugs in the wording. I admit however, that good and bad are not so much applicable to the interpretation itself, but rather the outcome.

In this particular case, both "avoid" and "magic constant" are imho open to interpretation (IIRC, Autosar has a much stricter wording). The enforcement section is less ambiguous, but it also contains ther term "and others" and anyway can't be taken at face value, because it also wouldn't allow constexpr double foo = 0.001; which still has "a litteral in code". So the clang tidy implementers obviously already did interpret the rule to more align with the spirit of the rule.

Btw.: An advice I've given a junior programmer in the past was more or less: "If 5 is a magic constant, so is constexpr int five = 5, even if the tool doesn't flag it. So either find a better name, leave it be or make a step back and see if you should have written the function in a different way to begin with".

bgloyer commented 2 years ago

I think the rule is good but agree that the enforcement should be relaxed in cases where a number represents itself and is impossible to change. For example a triagle always has 3 points.

    if(polygon.size() == 3) // good, 3 is representing itself and cannot change
    {
        Triangle t{polygon};
        // .. . .
    }

    constexpr int three {3};
    if(polygon.size() == three) // bad
    {
        Triangle t{polygon};
        // .. . .
    }

    constexpr int num_points_in_triangle {3};
    if(polygon.size() == num_points_in_triangle) // bad
    {
        Triangle t{polygon};
        // .. . .
    }

I have seen things like this introduced because automated tools are flaging 3 as a magic value. Under time pressure it is sometimes faster to 'fix' it than flag false positives.

I think the main point of the rule is to catch thinigs like this which is obviously bad:

    int status = something();
    if(status == 3) // bad, unclear what 3 means
    {
        // .. . .
    }

Maybe add something like this and the sphere calculation as examples to the rule?

jwakely commented 2 years ago

One more thing that I learnt recently - the introduction of "common utility constants" to comply with this rule can lead to very subtle, undetected ODR violations (which is Undefined Behavior) in C++14 if not done carefully:

https://stackoverflow.com/questions/69105602/possible-odr-violations-when-using-a-constexpr-variable-in-the-definition-of-an

In practice this is a non-issue. Such variables aren't odr-used in most cases, and if they are (e.g. by binding then to a reference parameter of std::max) then everything works as expected. The only way it can go wrong is if the inline function / function template actually depends on the address of the constant. Don't do that, it would be stupid. Nobody should care about &pi, only the constant's value.

This is not something you should waste time worrying about for sensible uses of arithmetic constants.

If you're still worried and can't make the constants local variables because you're stuck with C++11, you can write a simple constexpr function to return the constant instead of using a global variable.

hsutter commented 2 years ago

Editors call: We think the current rule is reasonable, but we're open to extending the Enforcement to be more flexible/tasteful than allowing only 0, 1, and nullptr. Any suggestions?

BenjamenMeyer commented 2 years ago

I think the rule is good but agree that the enforcement should be relaxed in cases where a number represents itself and is impossible to change. For example a triagle always has 3 points.

    if(polygon.size() == 3) // good, 3 is representing itself and cannot change
    {
        Triangle t{polygon};
        // .. . .
    }

Chances are, if you're doing that then an enum would be better than 3 as you're likely representing multiple types of things, so

enum OBJECT_TYPES {
  TRIANGLE = 3,
  RECTANGLE= 4,
};
...
     if(polygon.size() == TRIANGLE)
     {
         Triangle t{polygon};
         // .. . .
     }
carlosgalvezp commented 2 years ago

Any suggestions?

should refrain from converting a magic number problem into a naming hazard …

"If 5 is a magic constant, so is constexpr int five = 5, even if the tool doesn't flag it. So either find a better name, leave it be or make a step back and see if you should have written the function in a different way to begin with".

Examples: volume of the sphere, array indices, array sizes.

if(polygon.size() == TRIANGLE)

I disagree with the semantics of this code, comparing a "size" with a "object_type". But this is a bit off topic.

jwakely commented 2 years ago

Clang-tidy has a default allowed list of like [0, 1, 2, 3, 4, 100]. Those numbers are often used in 4x4 matrices (size and indexing). Would it make sense to port that list over to here?

No. A "hardcoded" list in the guidelines is just going to cause more problems. When people need to use some number that isn't in the Holy List of Acceptable Numbers the argument will be "look, the guidelines show exactly which numbers it's OK to use as literal constants, all others are forbidden!" It will also make it harder for analyzers like clang-tidy to update their own list based on user experience ("why are you changing this list so it doesn't match the very explicit list in the guidelines?")

The guidelines should simply be expressed in terms that encourage people to use their brains, write better code, and avoid magic numbers with unclear provenance ("why is this list 31 elements long? If we change it here, is there some other part of the code that needs to change to match?")

The guidelines cannot be too prescriptivist, and cannot be a substitute for using your own judgement about what makes sense in a given situation.

carlosgalvezp commented 2 years ago

Fair enough :)

How would you suggest to work with this rule, in practice? Would you say this rule is meant to enforced automatically? All static analyzers I know of will flag all literals (with some hardcoded exceptions). Local suppressions are possible, but can add quite some noise depending on the context.

BenjamenMeyer commented 2 years ago

if(polygon.size() == TRIANGLE)

I disagree with the semantics of this code, comparing a "size" with a "object_type". But this is a bit off topic.

semantics could possibly be better, but the point still stands

carlosgalvezp commented 2 years ago

This is an example formula from a scientific paper that we recently run into. It contains magic numbers (2 and 6) and our tools will flag them as magic numbers.

Screenshot from 2022-01-17 10-38-00

BenjamenMeyer commented 2 years ago

This is an example formula from a scientific paper that we recently run into. It contains magic numbers (2 and 6) and our tools will flag them as magic numbers.

Screenshot from 2022-01-17 10-38-00

And it should; though a something like a # disable:flag <reason why> should be sufficient for such cases; a good example of why linters shouldn't be absolute and must be configurable and have the ability to be overridden inline.

shaneasd commented 2 years ago

Although I acknowledge this is a problem with no good solution, I'd like to point out that having to suppress warnings inline can make the code probably harder to read and definitely harder to write.

BenjamenMeyer commented 2 years ago

Although I acknowledge this is a problem with no good solution, I'd like to point out that having to suppress warnings inline can make the code probably harder to read and definitely harder to write.

It's the trade-off with applying tools like linters as you have to communicate with the tool at times; but the impact to readability is negligible. It's common to see this pattern in the Python world.

hsutter commented 2 years ago

Editors call: What is magical depends on the domain, so it is not possible to define what is magical independently of context. An enforcement rule has to take that into account and include some optionality.

Perhaps change this:

Enforcement Flag literals in code. Give a pass to 0, 1, nullptr, \n, "", and others on a positive list.

to something like this?

Enforcement Flag literals not used as constructor arguments.

@jwakely, would that be better or worse?

jwakely commented 2 years ago

Works for me.

dfrib commented 2 years ago

Please note that the rule says "Avoid “magic constants”", not "Don't use "magic constants"". For example defining k0/k1/k2 for the indices 0/1/2 in your example looks like overkill. Maybe the enforcement of ES.45 is a bit too strict/too noisy.

Also the guidelines say "The rules are not precise to the point where a person (or machine) can follow them without thinking."

For some context (for future readers), in the safety critical domain rules[1] such as "avoid", non-precise by nature (but very useful nonetheless, "apply common sense") are typically either removed or, more frequently, upgraded into "shall not", which is the particular reason for our internal discussion on ES.45. We unfortunately see many junior devs writing code which satisfies the tool but which arguably only makes the code worse (kFive) without adhering to the actual underlying advice of the rule (kFive is still a magic number).

Education and mentoring can of course make things better, but in general we struggle in the safety critical domain particularly with rules originated as "good advice; use reasonably" when they are upgraded to strict requirements. Our context-specific project deviations can open up for more lenient rules, but comes with both a compliance burden(/risk) and with some code noise, as pointed out by @shaneasd. I'm keen to see how MISRA will tackle this particular rule in their upcoming standards.

[1] E.g. AUTOSAR C++14 Guidelines, rule A5-1-1.