ES.20: it may encourage prorgammers to write unsafe code

isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines

Other

42.48k stars 5.43k forks source link

ES.20: it may encourage prorgammers to write unsafe code #1847

Closed akrzemi1 closed 2 years ago

akrzemi1 commented 2 years ago

ES.22 ("Don’t declare a variable until you have a value to initialize it with") is a very good rule. The only excuse for not applying it is when you have to deal with an externally imposed interfce (e.g. from a third party library) which chooses another solution for initialization: where you have to provide a reference to an object in order for it to be initialized. For instance:

int i;
std::cin >> i; // i must exist to be written to

This exceptional case is covered by ES.20 ("Always initialize an object"). It is unfortunate that ES.20 is ordered before ES.22. Because ES.22 gives superior advice, and ES.20 only gives a patch solution for the cases where ES.22 cannot be applied.

My recommendaiton would be to order ES.22 before ES.20 to highlight the relationship between them. Another suggestion would be to mention in ES.20 that it should be applied only if ES.22 cannot be applied.

But there is more. For the cases where you cannot apply ES.22, the advice in ES.20 is easily misread as an encouragement to write unsafe code. However, this cannot be quickly explained in a GitHub issue report. I have written a post that describes the real life situation where a programmer received a static analyzer warning, and had an opportunity to fix the bug, but instead they concealed the bug because this is what ES.20 told them to do. Here it is: https://akrzemi1.wordpress.com/2018/11/22/treating-symptoms-instead-of-the-cause/.

ES.20 claims that people leave variables uninitialized for performance reasons, and that this is wrong. But this looks like a strawman argument for me. I agree that some people do it for performance. But my motivating reason for leaving variables uninitialized -- in situations where I cannot apply ES.22 -- is safety: it is better to have a bug in the code reflected through undefined behavior (because then I can use a UB sanitizer to detect it) than to have a bug not reflected by anything (and therefore concealed).

BenjamenMeyer commented 2 years ago

For the cases where you cannot apply ES.22, the advice in ES.20 is easily misread
as an encouragement to write unsafe code.

So ES.20 says to initialize everything while ES.22 encourages delaying creating variables until a value is known to use them with...I'd hardly say that ES.20 encourages unsafe code. Unsafe code would be leaving the values uninitialized.

To use your own example:

int i;
std::cin >> i;

You already failed ES.20 since this should be:

int i = 0;
std::cin >> i;

More importantly, ES.20 probably refers more to objects where ES.22 refers to object instances or variables.

Also, leaving variables uninitialized is more unsafe than initializing to a known value. Why? It can pull in old memory fragments can cause unintended behavior.

akrzemi1 commented 2 years ago

Oh dear, I think you misunderstood my point. The example with std::cin is not to show any good practice. it is to demonstrate a library interface that prvents the applicaiton of ES.22.

I never intended to apply ES.20, because I find it wrong in the cases where ES.22 cannot be applied.

Note that the opinion "leaving variables uninitialized is more unsafe than initializing to a known value" is your private, and it is not clear at all if it helps. It may help mitigate the damage caused by a program that contains a bug, but it does not remove the bug, and in fact makes the bug harder to find.

The opinion above is good when one doesn't use tools like UB-sanitizers or static analyzers. If one does, the advice to innitialize any variable to any value is likely harmful.

The following articles describe why this is the case, in detail. https://akrzemi1.wordpress.com/2016/12/12/concealing-bugs/ https://akrzemi1.wordpress.com/2018/11/22/treating-symptoms-instead-of-the-cause/

BenjamenMeyer commented 2 years ago

Oh dear, I think you misunderstood my point. The example with std::cin is not to show any good practice. it is to demonstrate a library interface that prvents the applicaiton of ES.22. I never intended to apply ES.20, because I find it wrong in the cases where ES.22 cannot be applied.

If you're going to quote an example, then apply the things being discussed.

Note that the opinion "leaving variables uninitialized is more unsafe than initializing to a known value" is your private, and it is not clear at all if it helps. It may help mitigate the damage caused by a program that contains a bug, but it does not remove the bug, and in fact makes the bug harder to find.

The opinion above is good when one doesn't use tools like UB-sanitizers or static analyzers. If one does, the advice to innitialize any variable to any value is likely harmful.

The following articles describe why this is the case, in detail. https://akrzemi1.wordpress.com/2016/12/12/concealing-bugs/ https://akrzemi1.wordpress.com/2018/11/22/treating-symptoms-instead-of-the-cause/

Your blogs are wrong. You're discussing logic errors and completely missing other errors.

In your first article, your bug is that you used the wrong variable - you even blame the IDE. The bug is not related to not initializing the variable (foundName), and removing the initialization will not necessary make it more evident that you did that since it will contain some random value. In fact, if you used VC++ in debug mode it'd automatically get assigned to 0xCCCCCCCC behind the scenes while being left to a random value in release mode. (Personally I argue that VC++'s behavior in that respect is itself wrong but for other reasons.)

In your second article, you blame the initialization of v on the logic bug in your if-block. Indeed, returning the random uninitialized value could actually create a number of other bugs and security issues with your software, which is why the static analyzers warn about uninitialized values.

In neither case is failing to initialize the value actually going to help you find your logic bug. Rather, assigning a default, known value is actually used to find those logic bugs since you'll see your known value (0xCCCCCCCC in VC++'s default case, but really it should be something explicitly known and relevant.)

hsutter commented 2 years ago

Andrzej is making a correct point in this thread. (I haven't re-read the blog posts.)

The opinion of the guidelines editors collectively is to endorse initialization at declaration as the best consensus advice given the current C++ language -- but we already carve out exceptions to that because we know it is impractical to follow always. For example, this violation of that rule is explicitly endorsed as "okay" by the guidelines:

char buf[BUF_SIZE];  // uninitialized, but "good" by the Guidelines
LoadData( &buf[0], BUF_SIZE );

The reason this violation of the rule is considered okay is because:

because of buf's size, it's known to be potentially expensive to initialize buf with dummy values just to satisfy the rule when those values will be immediately overwritten (optimizers are bad at eliminating these useless dead writes, and not because of lack of effort); and
for arrays it is often the case that there are not program-meaningful values to initialize buf with in the declaration statement, but you have to call another function to populate the values.

And an additional advantage of leaving buf uninitialized is that tools can detect and diagnose errors like the following, which can be introduced accidentally especially during maintenance:

char buf[BUF_SIZE];
DoSomethingWith( buf[0] ); // EEK!
LoadData( buf, BUF_SIZE );

We really want our (ideally static analysis) tools to diagnose the "EEK!" line, which is using uninitialized data. The Guidelines do not recommend forcing that line to use initialized data by requiring buf to be initialized at declaration -- if it did, that would hide the error from analyzers, so that tools would not be able to diagnose it because it would no longer be use of uninitialized data, it would be use of initialized-but-dummy-programing-unmeaningful data but the tool doesn't know about that.

So initialize-at-declaration when there are not yet program-meaningful values available hides bugs.

Which of course brings us back to Andrzej's point, which is that this code:

int i;
std::cout >> i;

is no different. However, here the Guidelines do require forced dummy initialization of i, which forces hiding errors like:

int i = 0;
DoSomethingWith( i );  // EEK! but hidden, not diagnosable by tools
std::cout >> i;

There's no guarantee that 0 is a valid value of i, because in this case presumably the valid value of i is whatever is supplied by the user.

Summarizing: For both the buf array and the i integer, anytime we do not have a program-meaningful value at the point of declaration ( because we have to call a function to populate a meaningful value), if we force initialization with a dummy value we:

force adding useless dead writes that are hard for optimizers to remove (for little variables like i we just live with that, for arrays like buf the issue is bad enough that we have to care)
hide bugs by interfering with static analysis tools and sanitizers (this is not theoretical, it regularly hides real bugs).

Yet currently the Guidelines require these downsides for i, but not for buf.

BenjamenMeyer commented 2 years ago

There's no guarantee that 0 is a valid value of i, because in this case presumably the valid value of i is whatever is supplied by the user.

It's for the programmer to decide what a good initialization value is - it may be zero, it may not be; but it's context sensitive. So i should be initialized or documented as to why it shouldn't be (in the case of using it for random data for seed values in a security context).

I can certainly agree that buf can be skipped; though generally objects should be initializing themselves via Constructors, and a memory buffer (like above) can easily be initialized quickly with memset or bzero and other mechanisms.

All-in-all, i shouldn't qualify like buf would.

akrzemi1 commented 2 years ago

@hsutter Thanks. So, I acknowledge that guideline in ES.20 is a tradeoff. And as it is the case with trade-offs, they cannot be perfect. However, there appears to be a substantial difference between ES.22 and ES.20: the former cannot ever do harm, whereas the latter can. Maybe these rules should be tagged with an information like this. Static analyzers try to be cautious not to produce false positive warnings. I would be upset to see the static analyzer warn me about ES.20 by default.

A solution that I would find more appropriate would be to have such dummy value initializaiton hidden under a macro:

int i DUMMY_INITIALIZE_WITH(-1);
std::cout >> i;

For the purpose of doing static analysis, I would define the macro to do nothing, and for the release builds, I would use the value. Obviously, a solution like this cannot be recommended in design guidelines.

However, I would recommend that the rule is updated with some explanation that it can have a negative side effects of impeding bug detection, even for small scalar types. How does this work? Should I file a pull request, or is it enugh to file suggestions through an issue?

akrzemi1 commented 2 years ago

@BenjamenMeyer It looks like we agree on one thing. In order to be able to diagnose a bug (which is not "forgetten to provide the value upon initialization" but "forgotten to assign value at the point where I was supposed to have it") -- as opposed to only minimizing the damage of a bug -- the state of being uninitialized needs to be represented as something distinct from any "normal" value that we could possibly use in a correct program.

This observation alone would improve ES.20. Because ES.20 only tells you to provide any initial value, and people often "implement" this advice by using a default value:

int i {};

And this usually has the opposite effect: the default value is the one that occurs most ofte in the correct program, so it is a bad candidate for a "special" value that indicates a bug. The advice that whoever initializes the variable should pick the value that uniquely represents a missing value is not satisfactory, as it assumes that such a value even exists in a given context.

On the other hand, note that static analysis that uses symbolic evaluation does exactly this. Because it is not confined to storing values within objects of type int, it can afford to represent the "uninitialized" state as a distinct state from any of the 2^32 states represented by type int; therefore, the information about missing initialization is not lost, and this works for any type, in any context.

Maybe ES.20 should explicitly say that it chooses to favor the mitigation of the consequences of bugs in the released code over assisting the tools in the process of detecting bugs statically.

BenjamenMeyer commented 2 years ago

@BenjamenMeyer It looks like we agree on one thing. In order to be able to diagnose a bug (which is not "forgetten to provide the value upon initialization" but "forgotten to assign value at the point where I was supposed to have it")

Well, you still fail to identify the actual bug - which was not either of those things. It was a logic bug because you used the wrong variable (in one case) or wrote the wrong if-block (in the other). Neither were issues with the variable you are blaming or its assignment, and any variable in scope that would have matched the type would have caused the same issue. Stop blaming the variable - it has zero to do with the bugs you observed other than that it was used.

And I would 100% expect a static analyzer to point out when variables were not initialized or used before they were initialized and expect that developers would have to add comments for the static analyzer to ignore the warnings when failure to initialize was purposeful and expected - such cases should also be extremely rare.

A static analyzer won't find your logic error because it's exactly that - a logic error which requires someone to review.

EDIT: Updated a line from switch case to if-block; missed it in my review after re-reviewing the related article.

On the other hand, note that static analysis that uses symbolic evaluation does exactly this. Because it is not confined to storing values within objects of type int, it can afford to represent the "uninitialized" state as a distinct state from any of the 2^32 states represented by type int; therefore, the information about missing initialization is not lost, and this works for any type, in any context.

A static analyzer will find that a value was not initialized because it looks for that value being initialized in the static code. Basic types such as int do not have any state that is not initialized in and of themselves as they are explicitly not objects. This is not C# or Java were you have the Int object type that wraps the basic raw int type; there is not state to the basic types, only their values.

akrzemi1 commented 2 years ago

Well, you still fail to identify the actual bug - which was not either of those things. It was a logic bug because you used the wrong variable (in one case) or wrote the wrong if-block (in the other). Neither were issues with the variable you are blaming or its assignment, and any variable in scope that would have matched the type would have caused the same issue. Stop blaming the variable - it has zero to do with the bugs you observed other than that it was used.

And I would 100% expect a static analyzer to point out when variables were not initialized or used before they were initialized and expect that developers would have to add comments for the static analyzer to ignore the warnings when failure to initialize was purposeful and expected - such cases should also be extremely rare.

A static analyzer won't find your logic error because it's exactly that - a logic error which requires someone to review.

I think that there is a misunderstanding here. I have a bug in the code which is "reading the wrong condition in the if-statement". A static analyzer cannot detect this directly (because it can't possibly know what is the right condition). But it can indirectly observe a symptom of my bug. The bug report from the static analyzer I would expect it, "Hey, I detected that a variable is read before being initialized. Programmer, go and check the code there, because you must have some bug close to this location." Now, a cautious developer will inspect the code, and observe that a wrong condition is checked. They will fix the bug. In contrast, a not-so-cautious developer, when they see the same warning, will mechanically apply rule ES.20 (initialize the int variable with some dummy value), and not even bother to see if there is any rel bug in the surrounding code.

My concern is that ES.20 encourages the programmers to be not-so-cautious.

BenjamenMeyer commented 2 years ago

To make this conversation easier, here's the code in question taken from https://akrzemi1.wordpress.com/2018/11/22/treating-symptoms-instead-of-the-cause/:

int make_i(bool c1, bool c2)
{
  int v; // no value specified
  if (c1)        fill_1(v);
  if (!c1 && c2) fill_2(v);
  if (!c1 && c2) fill_3(v); // <- BUG
  return v;
}

I think that there is a misunderstanding here. I have a bug in the code which is "reading the wrong condition in the if-statement". A static analyzer cannot detect this directly (because it can't possibly know what is the right condition). But it can indirectly observe a symptom of my bug. The bug report from the static analyzer I would expect it, "Hey, I detected that a variable is read before being initialized. Programmer, go and check the code there, because you must have some bug close to this location." Now, a cautious developer will inspect the code, and observe that a wrong condition is checked. They will fix the bug. In contrast, a not-so-cautious developer, when they see the same warning, will mechanically apply rule ES.20 (initialize the int variable with some dummy value), and not even bother to see if there is any rel bug in the surrounding code.

Yes, you have a problem with the code. Initializing v at the start of the function actually decreases your problems. Your problem is a logic problem. If you intend that v should only be initialized by one of those if blocks then you need to define a means of detecting that v did not get set and use that. By not setting a default value you leave open a potential security leak when the if-blocks fail to operate and set the value.

Additionally, your last two if-blocks have the same condition. Changing v to be uninitialized will not resolve your bug; it won't do anything for you bug.

If, on the other hand, you had:

const int MY_DEFAULT_VALUE = 0xCCCCCCCC;
int make_i(bool c1, bool c2)
{
  int v = MY_DEFAULT_VALUE;
  if (c1)        fill_1(v);
  if (!c1 && c2) fill_2(v);
  if (!c1 && c2) fill_3(v); // <- BUG
  assert(v != MY_DEFAULT_VALUE);
  return v;
}

Then you'd have discovered your issue - assuming of course MY_DEFAULT_VALUE is not a possible value from any of the other functions - but that can only be determined by a developer, not static analysis tools.

The problem I have with this entire discussion is that you're attempting to solve one problem - finding bad logic - by creating another problem - data leaks via uninitialized variables. When your code gets hacked, it's these exact kinds of things that crackers look for to find memory information to make bigger exploits. Further, they don't resolve your actual problem.

Now suppose you had the following:

int make_i(bool c1, bool c2)
{
  int j = 5;
  int v = 0; // no value specified
  /* some other code that makes us of `v` or j` */
  if (c1)        fill_1(v);
  if (!c1 && c2) fill_2(v);
  if (!c1 && c2) fill_3(v); // <- BUG
  return v;
}

You can't uninitialize v because it's used earlier, so your argument that uninitializing v would find your logic bug (or make it easier) is therefore invalid. Further, suppose you used j instead of v on one of those lines - again, it's used earlier on and thus makes your other point invalid.

The problem is not the variables or their initialization status. They are a sideshow to the actual issue - that the third if-block (in the code above) didn't have the right logic; nothing else matters. How you diagnose that does, but blaming it on variable initialization isn't going to help you at all.

What will help is writing better logic:

int make_i(bool c1, bool c2)
{
  int v = 0;
  if (c1)        fill_1(v);
  else if (!c1 && c2) fill_2(v);
  else if (!c1 && c2) fill_3(v); // <- BUG -- static analyzer will report this is unreachable!
  return v;
}

By merely using nested if-blocks the static analyzer will now complain that you can't reach the line of code where the bug is.

However, you also don't do anything with c2 when c1 is true; which might mean another bug - one can't tell since there's no documentation. And in fact you also have an issue where someone doing some minor edits might cause another issue since there's no braces

NOTE: I know - all kinds of arguments over whether braces should be there or not; I favor long term maintainability and being as clear and explicit as possible to in order to reduce errors. I'm just pointing out the potential for bugs during maintenance.

So the best version of this would be:

int make_i(bool c1, bool c2)
{
  int v = 0;
  if (c1 /* and ignore state of c2 */) {
     fill_1(v);
  } else if (!c1 && c2) {
     fill_2(v);
  } else if (!c1 && c2) {
    fill_3(v); // <- BUG -- static analyzer will report this is unreachable!
  } 
  return v;
}

akrzemi1 commented 2 years ago

Yes, you have a problem with the code. Initializing v at the start of the function actually decreases your problems. Your problem is a logic problem. If you intend that v should only be initialized by one of those if blocks then you need to define a means of detecting that v did not get set and use that.

Then, your advice is that whenever I cannot initialize my object with a meaningful value, I have to devise a runtime machinery that verifies the effects of my initialization. Now we are not trading a single assignment of meaningless value, we are trading even more redundant code. And clearly, you are offering a different advice than ES.20.

By not setting a default value you leave open a potential security leak when the if-blocks fail to operate and set the value.

True. On the other hand, I do not see the situation much better when I allow my function to return with the value that I never intended to return. This might be subject to an exploit as well.

Then you'd have discovered your issue - assuming of course MY_DEFAULT_VALUE is not a possible value from any of the other functions - but that can only be determined by a developer, not static analysis tools.

Often, I will not be able to tell if there is any spare value that I can use this way. On the other hand during symbolic analysis you do not assign values to ints: you just as questions if an uninitialized value can be propagated to the point where the variable is read.

The problem I have with this entire discussion is that you're attempting to solve one problem - finding bad logic - by creating another problem - data leaks via uninitialized variables. When your code gets hacked, it's these exact kinds of things that crackers look for to find memory information to make bigger exploits. Further, they don't resolve your actual problem.

I actually agree with most of the above. I see it as a trade off.
You say: "leave the bug undetected, but try to reduce the scope of the damage". I say: "risk greater damage, but buy a chance (not a guarantee) to detect the bug before the program is even run." In the end you might be right, that your solution offers more benefits than mine, but the trade off is not obvious. My request is that ES.20 should acknowledge that it is a controversial trade-off.

Note again that hackers can also exploit the fact that parts of the functions return values in an uncontrolled way. This is what ES.20 proposes. There may be no "safe default" in a given context.

You can't uninitialize v because it's used earlier, so your argument that uninitializing v would find your logic bug (or make it easier) is therefore invalid. Further, suppose you used j instead of v on one of those lines - again, it's used earlier on and thus makes your other point invalid.

My point is that I get a chance to detect the bug (and be able to correct it), not a guarantee that it will be detected. On the other hand, your example also demonstrates how your suggested alternative (with 0xcccccccc and assert()) doesn't work in this case.

The problem is not the variables or their initialization status. They are a sideshow to the actual issue - that the third if-block (in the code above) didn't have the right logic; nothing else matters. How you diagnose that does, but blaming it on variable initialization isn't going to help you at all.

The third if-statement is indeed the source of the problem. Leaving variables uninitialized, when I have no meaningful value to initialize them with, gives me an opportunity to detect the bug statically and fix it. In contrast, ES.20 doesn't give ma a chance to detect or fix the bug. It only tries to minimize -- but not remove -- the damage. I am not "blaming" the variable initialization (whatever that would mean), but I see the connection:

Leaving a variable uninitialized is a potential to create UB. But it is not UB on its own. It is not a bug. It will never lead to a UB n a correct program. It can only lead to UB in a program that contains a bug.
When I additionally plant a bug, I get UB.
Static analyzers are good at tracking UB: they have a chance to detect this one. If they do, it will uncover the bug that I have (the third if-statement), and I can fix it. The fix will be to change the condition in the third if-statement, but I will still leave the declaration of variable v without the initializer, in hope that this will help detect other bugs.

Having this uninitialized v (in cases where ES.22 is not applicable) is not a bug. It is not even bug-prone. It simply magnifies the consequences of a bug if there is one in the vicinity.

By merely using nested if-blocks the static analyzer will now complain that you can't reach the line of code where the bug is. You are correct, of course. But what conclusion should we draw from this observation? If I apply this (good) advice, my bug is detected, and I can correct it. However, I will still leave v without any dummy value in the declaration, in hope that it may help detect other bugs in the future.

Finally, let me reiterate. I acknowledge your point of view. I acknowledge that there may be far more environments where the advice in ES.20 helps than the environments where ES.20 actually does harm. My request with this issue is that the description in ES.20 should acknowledge that there are environments/situations where this advice actually does harm. There are rues, lie ES.22 which are universally good, and there are rules like ES.20 that are a result of the trade-off:they give you something, but at the same time they take away something from you.

BenBE commented 2 years ago

To make this conversation easier, here's the code in question taken from https://akrzemi1.wordpress.com/2018/11/22/treating-symptoms-instead-of-the-cause/:
int make_i(bool c1, bool c2)
{
  int v; // no value specified
  if (c1)        fill_1(v);
  if (!c1 && c2) fill_2(v);
  if (!c1 && c2) fill_3(v); // <- BUG
  return v;
}

A more DRY solution to your problem would actually be:

int make_i(bool c1, bool c2)
{
  int v; // no value specified
  if (c1) {
    fill_1(v);
  } else if (c2) {
    fill_2(v);
  } else {
    fill_3(v);
  }
  return v;
}

Oops, sorry. I accidentally fixed the possibility to introduce the original problem ;-)

Nonetheless, initializing v in this context would still be preferred and if there's no suitable initial value, you really should reconsider the API design. Apart from this, there should be a reasonable initial value and if that value cannot be something that can easily worked out (like if 0 is a sensible value that could be used for error cases) then you might need to resort to checking initialization with an assert. This is dirty in any way and the fault is not with initializing or not, but with overall API design of the called functions (although not initializing v should be a clear sign for caution in any code review).

The problem with this approach is that fill_1, fill_2, or fill_3 could for any reason miss to assign a value to v (due to a bug or intentional) and thus, despite the code being free of logic errors, still misbehaving and causing an uninitialized value returned from this function.

Long story short: Initialize your variables. And if not possible, pester whoever mal-designed the API you need use until that issue is fixed …

akrzemi1 commented 2 years ago

Long story short: Initialize your variables. And if not possible, pester whoever mal-designed the API you need use until that issue is fixed …

I agree with this advice. Now, if we map in onto CppCoreGuidelines, you are paraphrasing ES.22 (the superior one).

BenBE commented 2 years ago

Long story short: Initialize your variables. And if not possible, pester whoever mal-designed the API you need use until that issue is fixed …

I agree with this advice. Now, if we map in onto CppCoreGuidelines, you are paraphrasing ES.22 (the superior one).

Reading this again in detail, It's actually a combination of ES.20, ES.21, ES.22 and ES.5: So basically I'd suggest shortening the points of the rules as follows:

ES.20: Leave no variable uninitialized
ES.22: Don't declare a variable unless you have an initial value for it
ES.21: Initialize just right before first use / Minimize code between initialization and use

(Maybe reorder those three rules as shown in this list)

Additionally the reasoning behind ES.20 should explain the pitfalls of having a variable uninitialized. This may involve some wording that mentions the above API design as a pitfall and strongly advising to refactor such code when encountered.

With ES.22 the focus should be on in particular pointing out to avoid default construction and that "immediate writing to a buffer" doesn't free you from having that buffer initialized first (Security Implications). There may be a side node that for performance it can be necessary to skip the initialization of the buffer, but that such uses should be documented accordingly (read: Make them rare exceptions and not the norm).

Finally in ES.21 the wording can be improved by pointing out that elaborate initialization schemes like with the second bad example from ES.22 reduce clarity of the source and unnecessarily widen the lifetime of a variable. Refer to ES.28 for such situations if necessary for initialization or re-ordering of code when the intermediate code is unrelated.

akrzemi1 commented 2 years ago

@BenBE Note that what you propose does not address my concern expressed in this GitHub Issue. I'll try to summarize the concern. Rule ES.20 -- if we exclude cases covered in rule ES.22 -- makes a significant engineering trade-off. The trade off is between:

Increase the chance of finding programmer bugs statically, so that they are removed from the code before it is shipped.
Minimize the damage caused by a shipped runtime bug.

ES.20 favors option 2. This trade-off is far from being obvious. For instance, we are often taught that bugs are better handled at compile-time than at run-time. I do not argue with this trade-off decision. I request that it be documented in ES.20. Otherwise, I am not even comfortable recommending these rules in my workplace. they give an appearance of not being well thought over. I am not saying that they are not well thought over, I am merely saying that they give this appearance. Documenting the trade-off, would reduce this bad appearance: this would acknowledge that the authors see a non-obvious trade-off, have considered the alternatives, and concluded that this is the best option.

Note that other rules, like ES.22, do not have any trade-off to make: they are a win-win situation.

And you do not have to show me how I can make my example better. I also know how to make it better. Interestingly, the fix that you propose:

int make_i(bool c1, bool c2)
{
  int v; // no value specified
  if (c1) {
    fill_1(v);
  } else if (c2) {
    fill_2(v);
  } else {
    fill_3(v);
  }
  return v;
}

Does not require initializing variable v to a dummy value. (Which seems to acknowledge that there always are rules superior to ES.20.) The dummy assignment never fixes a bug. It only reduces the consequences of the bug (at the cost of making it harder to find the bug statically). Note that this is not the case for rules like ES.22. ES.22 genuinely prevents bugs. The reordering suggested in ES.22 can change the status of programmer bugs from runtime to compile-time errors.

So, please consider my request:

Document that ES.20 makes a non-obvious engineering trade-off.
Order ES.22 before ES.20 in order to reflect that ES.22 is superior.

BenBE commented 2 years ago

Interestingly, the fix that you propose:
int make_i(bool c1, bool c2)
{
  int v; // no value specified
  if (c1) {
    fill_1(v);
  } else if (c2) {
    fill_2(v);
  } else {
    fill_3(v);
  }
  return v;
}
Does not require initializing variable v to a dummy value. (Which seems to acknowledge that there always are rules superior to ES.20.)

Actually I'd argue that passing a reference to an uninitialized variable to a function is just asking for trouble in disguise. Thus as I mentioned this example fix might seem innocent and correct, but relies on the called functions to be well-behaved. The only situation I'd accept this kind of initialization was if C++ had an enforced means of an out specifier like Pascal dialects usually have, which tell the compiler that this variable must be assigned inside the function that is called.

Lacking such mechanism you're prone to cause UB without the compiler noticing, if the variable is not initialized. But I have absolutely no objections documenting the caveat you mention that initialization of this kind (initial value overwritten by actual initializer) is prone to hiding bugs in your program logic. Although IMO that overall pattern is flawed and should be cleansed whenever encountered …

The dummy assignment never fixes a bug. It only reduces the consequences of the bug (at the cost of making it harder to find the bug statically). The UB in this case may lead to information disclosure or random code execution bugs. The initialized version reduces this risk significantly.

So, please consider my request:

Document that ES.20 makes a non-obvious engineering trade-off.

Order ES.22 before ES.20 in order to reflect that ES.22 is superior.

I'd favor option 1., i.e. document the pitfalls with ES.20, although patterns that cause trouble with ES.20 should be avoided anyway.

BenjamenMeyer commented 2 years ago

@BenBE Note that what you propose does not address my concern expressed in this GitHub Issue. I'll try to summarize the concern. Rule ES.20 -- if we exclude cases covered in rule ES.22 -- makes a significant engineering trade-off. The trade off is between:

We have shown using your example several solutions, and I think the overall discussion here shows that ES.20 and ES.22 are not an issue as written. Honestly, I'd be in favor of just closing this since the proposed issues would:

lead to possible security vulnerabilities
potentially cause additional issues
is bad practice to leave variables uninitialized

overall, it's better to actually solve the problem correctly, not address symptoms (perceived or otherwise). I'd very much argue that your "it can't be initialized" is a perceived symptom/cause when it's actually a logic/architecture issue.

So, please consider my request:

Document that ES.20 makes a non-obvious engineering trade-off.

The non-obvious engineering trade-off isn't in requiring the variable to be initialized but in the poorly layed out code.

Order ES.22 before ES.20 in order to reflect that ES.22 is superior.

The ordering is fine as is since this has nothing to do with the problem you're actually trying to address.

MikeGitb commented 2 years ago

True. On the other hand, I do not see the situation much better when I allow my function to return with the value that I never intended to return. This might be subject to an exploit as well.

Just a nitpick on this particular argument (doesn't necessarily invalidate or outweigh other arguments). I see a hughe difference between the two cases: In the uninitialized case, there is a chance that you are disclosing data from your process or that the attacker can even control the content. Worst case: this is supposed to be a pointer to a write buffer for external data, and now the attacker has a chance to write data to arbitrary address in memory. If you initialize it to a n nullptr, your program will crash on most platforms.

Similar with ints: I can think of a lot of situations, where an unintended value of zero is harmless, whereas a very big number (either by random chamce or again, attacker controlled) can cause further trouble (think of array sizes, inddexes, or computations that overflow).

So while it may not be safe to return a value initialized variable, my personal impression is that its much safer than random or potentially even attacker controlled values.

bgloyer commented 2 years ago

I agree. ES.20 can cause bugs to be hidden in a small number of cases.

The were also good discussions in issues https://github.com/isocpp/CppCoreGuidelines/issues/131 and https://github.com/isocpp/CppCoreGuidelines/issues/1003.

BenjamenMeyer commented 2 years ago

I agree. ES.20 can cause bugs to be hidden in a small number of cases.

The were also good discussions in issues #131 and #1003.

Summary: Reading through those, they both deal with complex initialization - #131 added information about complex initialization and #1003 was ruled to have already been under the existing language. Further like here the example in #1003 could be fixed by other means (e.g lambda, additional function).

gdr-at-ms commented 2 years ago

It is not clear to me that the numbering of the rules is to suggest priority. I agree that ES.20 could mention ES.22 and vice versa.

I am also highly skeptical that rules that seek to complexify or be "optimal" in all situations -- e.g. "definite assignment before use" leads to better programming practice and maintainability. The current rules are good enough and cross-reference as originally suggested by Andrzej can help people in doubt.

hsutter commented 2 years ago

Editors call: We understand the rules are not fully consistent and can hide uninit bugs, but they are correct for the large majority of cases, and we feel this is what we want the advice to be for today's C++.