Closed martin-fink closed 3 years ago
Thanks, @martin-fink
Would it be possible to include the C source with varargs usage / declaration demonstrating the issue addressed in this PR?
I can include the C source, but (in this minimal example at least) the issue only comes up with clang 11 and older versions (this is how I noticed the issue in the first place -- on Debian buster, the default clang version is 7).
https://godbolt.org/z/KbT5E7vs6 shows the compiler output for clang 11.0.1 and clang trunk.
On line 33 of the output, older clang versions use r8b
as a temporary register, while newer versions use dl
. Both are valid, but just one triggers this issue. This is why I decided to include the generated assembly instead of the C source.
How should I proceed here?
Thanks @martin-fink , for the very helpful analysis of the issue. Much appreciated. Sorry for the delay in reviewing this PR.
Since call instruction is not a block terminator, the CFG of this binary is as follows (I presume your diagram depicts call as a block terminator to highlight the issue)
bb.0 -> [bb.1]
bb.1-> [bb.2, bb.3]
bb.2 -> [bb.1]
where r8
is defined in bb.2
- but after the call instruction for which the vararg discovery in being done. Note that we have already determined that r8
is not defined in bb.2
at the site of call instruction MI
by calling getPhysRegDefiningInstInBlock()
with call instruction specified as program point argument (2nd arg). Hence, we look to see if it is in the liveout set of all its predecessors. This is accomplished by calling the same function getPhysRegDefiningInstInBlock()
with nullptr
as program point argument (2nd arg) indicating the end of block.
bb.2
is reachable via predecessor walk from its predecessor bb.1
. r8
is in liveout set of bb.2
but is not defined at the program point of call instruction whose varargs are being discovered. It is incorrect to check for r8
being in liveout set of bb.2
and contradicts the earlier determination that r8
is not live at the call instruction. This bug is fixed by correctly indicating that bb.2
as visited before starting each of the predecessor the walk of bb.2
to find a reaching definition of r8
.
The following patch should fix this bug.
diff --git a/X86/X86MachineInstructionRaiser.cpp b/X86/X86MachineInstructionRaiser.cpp
index 28c24ea..2a42780 100644
--- a/X86/X86MachineInstructionRaiser.cpp
+++ b/X86/X86MachineInstructionRaiser.cpp
@@ -4510,6 +4509,8 @@ bool X86MachineInstructionRaiser::raiseCallMachineInstr(
// No blocks visited in this walk up the predecessor P
BitVector BlockVisited(MF.getNumBlockIDs(), false);
+ // CurMBB has already been visited. Mark it so.
+ BlockVisited.set(CurMBB->getNumber());
// Start at predecessor P
WorkList.push_back(P);
@@ -4536,10 +4537,7 @@ bool X86MachineInstructionRaiser::raiseCallMachineInstr(
WorkList.push_back(P);
}
}
- } else if (PredMBB->getNumber() == CurMBB->getNumber())
- // This is a loop. Simply increment ReachDefPredEdgeCount to
- // indicate that we have a reaching def.
- ReachDefPredEdgeCount++;
+ }
}
}
// If there is a reaching def on all predecessor edges then consider
Let me know of any feedback you might have.
Yes, that fixes this bug with even fewer modifications. Thanks! I've applied the patch and force-pushed a new commit with it.
Replace the iterative algorithm to check if a register is defined for a block or all of its predecessors with a recursive one. This fixes a bug where registers would be incorrectly identified as arguments even if they would not be defined in all predecessors.
Let's take the following function as an example:
In
bb.2
, we reach a vararg function call and need to discover potential argument registers. We find a definition forrdi
,rsi
,rdx
andrcx
in the same block. We then check ifr8
is defined here, which is not the case. After that, we checkbb.1
, which does not definer8
either. We addentry
andbb.3
to the work list, one of which definesr8
. In the old code,ReachDefPredEdgeCount
would now be set to 1. This would lead toReachDefPredEdgeCount > (unsigned)0) && (ReachDefPredEdgeCount == CurMBB->pred_size()
being evaluated to true, markingr8
as an argument register. This is incorrect, as we would need a definition ofr8
inentry
as well for this to be correct.I've replaced this iterative algorithm with a recursive one, which checks the following for the passed
MBB
:MBB
defines the register, returntrue
We also need to make sure that loops are handled properly in this recursive method, which we do by passing a
BitVector
of already visitedMBB
s. If we encounter a visitedMBB
, we just returntrue
. I think this is how the old code would handle such loops.