llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.87k forks source link

`DeclRefExpr::getEndLoc()` returns the same `SourceLocation` as `DeclRefExpr::getStartLoc()` #101764

Open delcypher opened 2 months ago

delcypher commented 2 months ago

Problem

DeclRefExpr::getEndLoc() returns the same SourceLocation as DeclRefExpr::getStartLoc(). This means calling DeclRefExpr::getSourceRange() doesn't return a useful range for diagnostic emission.

I would have expected DeclRefExpr::getEndLoc() to return a SourceLocation that points to the last character of the identifier in the source code.

Cause

Calling DeclRefExpr::getSourceRange() on a DeclRefExpr effectively returns a 1 character range because it uses

  /// This implementation is used when a class doesn't provide a custom
  /// implementation of getSourceRange.  Overload resolution should pick it over
  /// the implementation above because it's more specialized according to
  /// function template partial ordering.
  template <class S>
  SourceRange getSourceRangeImpl(const Stmt *stmt,
                                 SourceRange (Stmt::*v)() const) {
    return SourceRange(static_cast<const S *>(stmt)->getBeginLoc(),
                       static_cast<const S *>(stmt)->getEndLoc());
  }

and DeclRefExpr::getEndLoc() is implemented as

SourceLocation DeclRefExpr::getEndLoc() const {
  if (hasExplicitTemplateArgs())
    return getRAngleLoc();
  return getNameInfo().getEndLoc();
}

This effectively returns DeclRefExpr::getLocation() when DeclarationNameInfo::Name.getNameKind() is DeclarationName::Identifier because of ...

  DeclarationNameInfo getNameInfo() const {
    return DeclarationNameInfo(getDecl()->getDeclName(), getLocation(), DNLoc);
  }

  SourceLocation getLocation() const { return DeclRefExprBits.Loc; }

DeclRefExpr::getLocation() effectively returns the same location as DeclRefExpr::getStartLoc().

Fixing

We might be able to fix this by doing something like...

diff --git a/clang/lib/AST/DeclarationName.cpp b/clang/lib/AST/DeclarationName.cpp
index a3ac5551e0cc..4544c28c59db 100644
--- a/clang/lib/AST/DeclarationName.cpp
+++ b/clang/lib/AST/DeclarationName.cpp
@@ -497,7 +497,9 @@ void DeclarationNameInfo::printName(raw_ostream &OS, PrintingPolicy Policy) cons

 SourceLocation DeclarationNameInfo::getEndLocPrivate() const {
   switch (Name.getNameKind()) {
-  case DeclarationName::Identifier:
+  case DeclarationName::Identifier: {
+    return NameLoc.getLocWithOffset(Name.getAsIdentifierInfo()->getLength());
+  }
   case DeclarationName::CXXDeductionGuideName:
     return NameLoc;

Not sure if this really works in the presence of macros.

llvmbot commented 2 months ago

@llvm/issue-subscribers-bug

Author: Dan Liew (delcypher)

**Problem** `DeclRefExpr::getEndLoc()` returns the same `SourceLocation` as `DeclRefExpr::getStartLoc()`. This means calling `DeclRefExpr::getSourceRange()` doesn't return a useful range for diagnostic emission. I would have expected `DeclRefExpr::getEndLoc()` to return a SourceLocation that points to the last character of the identifier in the source code. **Cause** Calling `DeclRefExpr::getSourceRange()` on a `DeclRefExpr` effectively returns a 1 character range because it uses ``` /// This implementation is used when a class doesn't provide a custom /// implementation of getSourceRange. Overload resolution should pick it over /// the implementation above because it's more specialized according to /// function template partial ordering. template <class S> SourceRange getSourceRangeImpl(const Stmt *stmt, SourceRange (Stmt::*v)() const) { return SourceRange(static_cast<const S *>(stmt)->getBeginLoc(), static_cast<const S *>(stmt)->getEndLoc()); } ``` and `DeclRefExpr::getEndLoc()` is implemented as ``` SourceLocation DeclRefExpr::getEndLoc() const { if (hasExplicitTemplateArgs()) return getRAngleLoc(); return getNameInfo().getEndLoc(); } ``` This effectively returns `DeclRefExpr::getLocation()` when `DeclarationNameInfo::Name.getNameKind()` is `DeclarationName::Identifier` because of ... ``` DeclarationNameInfo getNameInfo() const { return DeclarationNameInfo(getDecl()->getDeclName(), getLocation(), DNLoc); } SourceLocation getLocation() const { return DeclRefExprBits.Loc; } ``` `DeclRefExpr::getLocation()` effectively returns the same location as `DeclRefExpr::getStartLoc()`. **Fixing** We might be able to fix this by doing something like... ``` diff --git a/clang/lib/AST/DeclarationName.cpp b/clang/lib/AST/DeclarationName.cpp index a3ac5551e0cc..4544c28c59db 100644 --- a/clang/lib/AST/DeclarationName.cpp +++ b/clang/lib/AST/DeclarationName.cpp @@ -497,7 +497,9 @@ void DeclarationNameInfo::printName(raw_ostream &OS, PrintingPolicy Policy) cons SourceLocation DeclarationNameInfo::getEndLocPrivate() const { switch (Name.getNameKind()) { - case DeclarationName::Identifier: + case DeclarationName::Identifier: { + return NameLoc.getLocWithOffset(Name.getAsIdentifierInfo()->getLength()); + } case DeclarationName::CXXDeductionGuideName: return NameLoc; ``` Not sure if this really works in the presence of macros.
delcypher commented 2 months ago

Hmm actually this might be a misunderstanding on my part. According to https://clang.llvm.org/docs/InternalsManual.html#sourcerange-and-charsourcerange . The begin and end sourcelocations in a range are supposed to point at the beginning of their respective tokens.

tbaederr commented 2 months ago

https://github.com/llvm/llvm-project/blob/cad835266ea12dbb3b602d9ae9ae479a550c47b4/clang/lib/Frontend/TextDiagnostic.cpp#L1101-L1102

This is what clang does