integrated-application-development / sonar-delphi

Delphi language plugin for SonarQube
GNU Lesser General Public License v3.0
104 stars 17 forks source link

File-level scan failure on number literal prefixed with ampersand #173

Closed fourls closed 9 months ago

fourls commented 9 months ago

Prerequisites

SonarDelphi version

1.1.0

SonarQube version

No response

Issue description

Delphi's grammar supports the addition of an ampersand immediately before an integer or floating point literal:

const
  MyInt = &123456;
  MyDouble = &123.456;

SonarDelphi currently does not support this, resulting in a scan failure when encountering the RTL unit System.Beacon, which uses this grammar feature seemingly pointlessly in a couple of literals.

Some observations:

Steps to reproduce

Scan the provided file, or any file that includes System.Beacon.

Minimal Delphi code exhibiting the issue

unit MyUnit;

interface

implementation

const
  MyInt = &123456;
  MyDouble = &123.456;

begin
  WriteLn(MyInt);
  WriteLn(MyDouble);

end.
Cirras commented 9 months ago

Wow, this makes very little sense. I guess that the optional ampersand is bound quite tightly to the numeric literal tokens and should be lexed into them.

fourls commented 9 months ago

Yes, that would make sense.

Another observed behaviour: any number of ampersands can be placed here, so the following is valid code:

const
  MyInt = &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&12345;
Cirras commented 9 months ago

Distressingly, this is also allowed for identifiers.

program Foo;

{$APPTYPE CONSOLE}

procedure &Bar; begin end;
procedure &&Bar; begin end;
procedure &&&Bar; begin end;
procedure &&&&Bar; begin end;

begin
  // Calls &Bar:
  Bar;

  // Calls what you would expect:
  &Bar;
  &&Bar;

  // [Error] E2003 Undeclared identifier: '&&Ba'
  // &&&Bar;

  // [Error] E2003 Undeclared identifier: '&&&Ba'
  // &&&&Bar;
end.

And there's some annoyingly inconsistent behavior where you can declare things but not reference them.

fourls commented 9 months ago

I think I encountered this a while back while investigating for #22, and basically decided that it's a totally undocumented sequence of tokens that the compiler uses internally (e.g. for implementing internal operators, in the case of the double ampersand). Even double ampersand has problems where you can't reference them in some situations.

Here's some more fun stuff:

unit MyUnit;

interface

implementation

const
  MyConst = 12345;

begin
  // Using just &&:

  // E2003 Undeclared identifier: '&MyConst'
  WriteLn(&&MyConst);
  // E2003 Undeclared identifier: '& MyCons'
  WriteLn(&& MyConst);
  // E2003 Undeclared identifier: '&  MyCon'
  WriteLn(&&  MyConst);
  // E2003 Undeclared identifier: '&   MyCo'
  WriteLn(&&   MyConst);

  // Some more advanced ones:

  // E2003 Undeclared identifier: '&&&MyCons'
  WriteLn(&&&&MyConst);
  // E2003 Undeclared identifier: '&&&MyCo'
  WriteLn(&&&&  MyConst);
  // E2003 Undeclared identifier: '&&&&&  MyC'
  WriteLn(&&&&&&  MyConst);
  // E2003 Undeclared identifier: '&&&&&  MyC'
  WriteLn(&&&&&&  MyConst);
end.
zaneduffield commented 9 months ago

It's probably not a good idea to try to replicate the compiler's exact behaviour in all of these buggy corner cases, especially not without any kind of documentation on the matter.

I think it's probably fine to just add support for what's used in the standard library: the single ampersand before a numeric literal.