microsoft / llvm-mctoll

llvm-mctoll
Other
816 stars 125 forks source link

[X86-64] Fix argument discovery for vararg functions #118

Closed martin-fink closed 3 years ago

martin-fink commented 3 years ago

Replace the iterative algorithm to check if a register is defined for a block or all of its predecessors with a recursive one. This fixes a bug where registers would be incorrectly identified as arguments even if they would not be defined in all predecessors.

Let's take the following function as an example: Untitled Diagram

In bb.2, we reach a vararg function call and need to discover potential argument registers. We find a definition for rdi, rsi, rdx and rcx in the same block. We then check if r8 is defined here, which is not the case. After that, we check bb.1, which does not define r8 either. We add entry and bb.3 to the work list, one of which defines r8. In the old code, ReachDefPredEdgeCount would now be set to 1. This would lead to ReachDefPredEdgeCount > (unsigned)0) && (ReachDefPredEdgeCount == CurMBB->pred_size() being evaluated to true, marking r8 as an argument register. This is incorrect, as we would need a definition of r8 in entry as well for this to be correct.

I've replaced this iterative algorithm with a recursive one, which checks the following for the passed MBB:

We also need to make sure that loops are handled properly in this recursive method, which we do by passing a BitVector of already visited MBBs. If we encounter a visited MBB, we just return true. I think this is how the old code would handle such loops.

bharadwajy commented 3 years ago

Thanks, @martin-fink

Would it be possible to include the C source with varargs usage / declaration demonstrating the issue addressed in this PR?

martin-fink commented 3 years ago

I can include the C source, but (in this minimal example at least) the issue only comes up with clang 11 and older versions (this is how I noticed the issue in the first place -- on Debian buster, the default clang version is 7).

https://godbolt.org/z/KbT5E7vs6 shows the compiler output for clang 11.0.1 and clang trunk. On line 33 of the output, older clang versions use r8b as a temporary register, while newer versions use dl. Both are valid, but just one triggers this issue. This is why I decided to include the generated assembly instead of the C source.

How should I proceed here?

bharadwajy commented 3 years ago

Thanks @martin-fink , for the very helpful analysis of the issue. Much appreciated. Sorry for the delay in reviewing this PR.

Since call instruction is not a block terminator, the CFG of this binary is as follows (I presume your diagram depicts call as a block terminator to highlight the issue)

 bb.0 -> [bb.1]
 bb.1-> [bb.2, bb.3]
 bb.2 -> [bb.1] 

where r8 is defined in bb.2 - but after the call instruction for which the vararg discovery in being done. Note that we have already determined that r8 is not defined in bb.2 at the site of call instruction MI by calling getPhysRegDefiningInstInBlock() with call instruction specified as program point argument (2nd arg). Hence, we look to see if it is in the liveout set of all its predecessors. This is accomplished by calling the same function getPhysRegDefiningInstInBlock() with nullptr as program point argument (2nd arg) indicating the end of block.

bb.2 is reachable via predecessor walk from its predecessor bb.1. r8 is in liveout set of bb.2 but is not defined at the program point of call instruction whose varargs are being discovered. It is incorrect to check for r8 being in liveout set of bb.2 and contradicts the earlier determination that r8 is not live at the call instruction. This bug is fixed by correctly indicating that bb.2 as visited before starting each of the predecessor the walk of bb.2 to find a reaching definition of r8.

The following patch should fix this bug.

diff --git a/X86/X86MachineInstructionRaiser.cpp b/X86/X86MachineInstructionRaiser.cpp
index 28c24ea..2a42780 100644
--- a/X86/X86MachineInstructionRaiser.cpp
+++ b/X86/X86MachineInstructionRaiser.cpp
@@ -4510,6 +4509,8 @@ bool X86MachineInstructionRaiser::raiseCallMachineInstr(
             // No blocks visited in this walk up the predecessor P
             BitVector BlockVisited(MF.getNumBlockIDs(), false);

+            // CurMBB has already been visited. Mark it so.
+            BlockVisited.set(CurMBB->getNumber());
             // Start at predecessor P
             WorkList.push_back(P);

@@ -4536,10 +4537,7 @@ bool X86MachineInstructionRaiser::raiseCallMachineInstr(
                       WorkList.push_back(P);
                   }
                 }
-              } else if (PredMBB->getNumber() == CurMBB->getNumber())
-                // This is a loop. Simply increment ReachDefPredEdgeCount to
-                // indicate that we have a reaching def.
-                ReachDefPredEdgeCount++;
+              }
             }
           }
           // If there is a reaching def on all predecessor edges then consider

Let me know of any feedback you might have.

martin-fink commented 3 years ago

Yes, that fixes this bug with even fewer modifications. Thanks! I've applied the patch and force-pushed a new commit with it.