`VariableInitialization` rule should be stricter around records

Chadehoc commented 11 months ago

Prerequisites

[X] This bug is in SonarDelphi, not SonarQube or my Delphi code.
[X] This bug has not already been reported.

SonarDelphi version

1.0.0

SonarQube version

No response

Issue description

This rule is particuliarly important, because the Delphi compiler itself misses so many cases on uninitialized variables, and this causes nasty real-world bugs. I hoped you could do better. (Indeed, if you offer this rule, it is because you are aware of the compiler's failures, so it is meant to do better!)

But no, here is a very basic program that passes the scan.

Steps to reproduce

Run the scanner on the provided mini project. The uninitialized variable in Main is not detected.

It should be done either using default: LThing := Default(TThing), or using a constructor (if one was defined).

I report it as a bug because this case is really a basic one.

Minimal Delphi code exhibiting the issue

program BugReport;

{$APPTYPE CONSOLE}

type
  TThing = record
    FThing: Integer;
    procedure SetThing(AVal: Integer);
    function GetThing: Integer;
  end;

function TThing.GetThing: Integer;
begin
  Result := FThing;
end;

procedure TThing.SetThing(AVal: Integer);
begin
  FThing := AVal;
end;

procedure Main;
var
  LThing: TThing;
begin
  WriteLn(LThing.GetThing); // used but not initialized, random output
  LThing.SetThing(0);
  WriteLn(LThing.GetThing);
end;

begin
  Main;
  ReadLn;
end.

zaneduffield commented 11 months ago

Hi! Thanks for taking the time to report.

While it would be nice to detect cases like these, I don't think it's quite as simple as it seems.

Here's (roughly) what the rule would have to do to catch this case:

Identify that the function in question returns the value of a field (without initialising it itself)
Identify that the field is not initialised before calling the function

I think we could improve the rule to catch this specific case pretty easily (it already does 2.), but more general cases involving conditional logic will continue to elude it until we have some kind of data-flow analysis.

In any case, this is probably more of a (welcome) "rule improvement" issue than a bug.

What do you think @Cirras?

zaneduffield commented 11 months ago

Given the rule already knows whether fields have been initialised (in unconditional cases), it could perhaps raise an issue on every function call on the uninitialised record? An implementation with less false positives would be one that knows which functions access which fields.

Cirras commented 11 months ago

The VariableInitialization rule has robust support for records, and can recognize usages of uninitialized (or even partially initialized) record values.

But there's a meaningful (and somewhat intentional) deficiency here.

The current heuristic

For any record method call, assume that the record is fully initialized afterwards.

And so, the rule gets very dumbed-down around method calls on a record.

Here's an example as to why...

program Example;

{$APPTYPE CONSOLE}

type
  TThing = record
    FThing: Integer;
    procedure Reset;
    procedure SetThing(AVal: Integer);
    function GetThing: Integer;
  end;

procedure TThing.Reset;
begin
  FThing := 0;
end;

function TThing.GetThing: Integer;
begin
  Result := FThing;
end;

procedure TThing.SetThing(AVal: Integer);
begin
  FThing := AVal;
end;

procedure Example1;
var
  LThing: TThing;
begin
  LThing.Reset; // An issue here would be a false-positive.
  WriteLn(LThing.GetThing);
end;

procedure Example1;
var
  LThing: TThing;
begin
  LThing.SetThing(0); // Calling a setter on an uninitialized record would also be a false-positive.
  WriteLn(LThing.GetThing);
end;

begin
  Main;
  ReadLn;
end.

In a perfect world...

As @zaneduffield mentioned:

We could...

Identify that the function in question returns the value of a field (without initialising it itself)

But can we, actually?

At the moment, no. We can't traverse a method call in the semantic model and analyze the variable assignments and method calls within it.

Importantly, the analyzer does not currently implement:

data-flow analysis
symbolic execution

These would be a substantial time investment to implement, but would also significantly improve the capabilities of the analysis engine in cases like this.

A better way?

I agree that the rule may be too gentle in cases like these.

It makes some significant assumptions, thinking of the record as a dumb data container (without getter/setter methods) that might optionally have a method to default-initialize it in some specific way.

We could change the heuristic from any method call to any constructor call. This would eliminate false-negatives around getters and other method calls, but introduce false-positives around "init" methods and setter methods.

So it's a trade-off. Thoughts are welcome.

Chadehoc commented 11 months ago

Thumbs up for your reactivity!

This rule is the most important to my eyes. Other rules make sense and contribute to code standards, but this one could spot real bugs in a huge, old code base. Sure, the real problem is the deficiency of the Delphi compiler, but we can expect nothing from Embarcadero in this.

The counter-example you give seems a code-smell to me. This variable is not initialized. As it is written, Reset is meant to reset an existing value, not initialize one. The Reset function can fail to evolve if a field is added to the record, for example. In this example, only the lack of an initialization bug can be told by reading the code.

Compare it to another potential case which could be thought a "false positive": initialization is done under a condition, and use is done later under the same condition. At that very moment, there is no bug, but I claim there is an uninitialized variable, and a future bug. There should be at least an else LVar := Default(T).

Initialization is structural (and as such, is normally detected by a compiler). A constructor is meant to initialize, a normal function isn't. (I would even personally not trust a constructor, and demand that the result should be initialized in the constructor.)

I would expect that initializing a record variable is done only by:

Assigning it: to Default(T), to a constructor, to a constant, to another initialized variable...
Getting it out from an OUT argument (not VAR, which expects an initialized variable)
Within a constructor, by calling another constructor.

I admit it is debatable whether to apply this rule in a constructor, as it should be meant to initialize. But doing so can detect bugs (constructors not actually fully initializing their result), so I would argue for it.

More generally, there could be customizable exceptions to those rules, for example "methods named Reset or Clear or Initialize should be trusted to initialize properly".

Other related rules I would welcome:

A VAR argument should be initialized before the function being called
An OUT argument should not (unless the assigned value was used, which means the variable may just have been reused).
In the called function, an OUT argument should be initialized, and should not be used until initialized.

As for these, I know that internally, OUT is handled as VAR by the compiler, which is another failure leading to bugs. Nobody should rely on that!

I didn't look yet to custom rules, as I was just trying out. If something could be done with custom rules, I could try that.

Cirras commented 11 months ago

Hi @Chadehoc,

The counter-example you give seems a code-smell to me. This variable is not initialized. As it is written, Reset is meant to reset an existing value, not initialize one. The Reset function can fail to evolve if a field is added to the record, for example. In this example, only the lack of an initialization bug can be told by reading the code.

I see your point. 👍 Arguably, any method which fully initializes a value type a constructor and should change to become one.

Initialization is structural (and as such, is normally detected by a compiler). A constructor is meant to initialize, a normal function isn't. (I would even personally not trust a constructor, and demand that the result should be initialized in the constructor.)

I believe that we can handle the "not trusting a constructor" case by having a separate rule: Records should be fully initialized

A rule that ensures a record is fully initialized in any constructor or Initialize (for managed records)

Feature request welcome!

I would expect that initializing a record variable is done only by:

Assigning it: to Default(T), to a constructor, to a constant, to another initialized variable...

Agreed.

Getting it out from an OUT argument (not VAR, which expects an initialized variable)

I have to disagree on this one. Due to the similarity of the behavior between out and var, it's common for them to be used interchangeably and without much care/diligence. With all of this confusion around the behavior, some codebases even go so far as to ban the usage of out - this very analyzer once had a rule to that effect.

So I think we should stick with the current logic of treating both var as out as initializing the record.

Within a constructor, by calling another constructor.

Agreed.

More generally, there could be customizable exceptions to those rules, for example "methods named Reset or Clear or Initialize should be trusted to initialize properly".

Maybe, but I'd lean more towards a "one true solution" of just having them turn the method into a constructor. Something that would convince me otherwise is if any popular libraries using this type of Reset/Initialize pattern.

Other related rules I would welcome:

A VAR argument should be initialized before the function being called

Feature request welcome - sounds useful!

An OUT argument should not (unless the assigned value was used, which means the variable may just have been reused).

Not as sure of the value with this one.

In the called function, an OUT argument should be initialized, and should not be used until initialized.

This is an oversight in the VariableInitialization rule - it really should be treating out parameters as if they're uninitialized variables. I'll raise a new issue.

Summary

The "method call" heuristic in this rule should instead become "constructor call"
VariableInitialization should be treating out parameters as uninitialized
Feature request welcome for:
- Records should be fully initialized
- 'var' arguments should be initialized

What do you think?

Chadehoc commented 11 months ago

Summary

* The "method call" heuristic in this rule should instead become "constructor call"

* `VariableInitialization` should be treating `out` parameters as uninitialized

* Feature request welcome for:

  * `Records should be fully initialized`
  * `'var' arguments should be initialized`

What do you think?

Good! I will gladly make feature requests.

Only two remarks:

So I think we should stick with the current logic of treating both var as out as initializing the record.

Ok, because it will then be detected by the future rule "'var' arguments should be initialized before the call":

For the current rule, var will be considered as initializing as well as out (but in my opinion, this is a workaround around a Delphi compiler deficiency having been abusively relied on by some code bases).
But if one activates the future rule, initializing with var will trigger the latter, and will be detected. Nice, because it really has the potential to cause bugs.

More generally, there could be customizable exceptions to those rules, for example "methods named Reset or Clear or Initialize should be trusted to initialize properly".

Maybe, but I'd lean more towards a "one true solution" of just having them turn the method into a constructor.

I lean toward this myself, too, and wouldn't use that. But thinking of existing code bases, it was a way to let them manage this rule. After all, if you accept initialization by var for this reason, initialization by other "locally accepted" methods comes in the same category.

Cirras commented 11 months ago

I'm changing this to a rule improvement issue - to make VariableInitilization stricter around records. In other words, to swap out the "method call" heuristic for a "constructor call" heuristic.

integrated-application-development / sonar-delphi