Merging inlined llvm.dbg.value creates broken metadata

uweigand commented 6 years ago


Bugzilla Link	36410
Resolution	FIXED
Resolved on	Mar 15, 2018 11:29
Version	trunk
OS	Linux
Attachments	Test case - run with "opt -simplifycfg"
CC	@adrian-prantl,@dwblaikie,@pogo59,@vedantk

Extended Description

Running the attached test case through "opt -simplifycfg" results in:

mismatched subprogram between llvm.dbg.value variable and !dbg attachment LLVM ERROR: Broken module found, compilation aborted!

The problem seems to be that (after inlining) the test case contains two llvm.dbg.value statements that were inlined from a subroutine:

call void @llvm.dbg.value(metadata i64 %vala, metadata !8, metadata !DIExpression()), !dbg !12

call void @llvm.dbg.value(metadata i64 %valb, metadata !8, metadata !DIExpression()), !dbg !13

!4 = distinct !DISubprogram(name: "callee", scope: !2, file: !2, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, unit: !1, variables: !7)

!10 = distinct !DISubprogram(name: "caller", scope: !2, file: !2, line: 5, type: !5, isLocal: false, isDefinition: true, scopeLine: 5, isOptimized: false, unit: !1, variables: !3) !11 = distinct !DILocation(line: 6, scope: !10) !12 = !DILocation(line: 2, scope: !4, inlinedAt: !11) !13 = !DILocation(line: 3, scope: !4, inlinedAt: !11)

These two statements are now being merged as part of a SimplifyCFG operation. The resulting llvm.dbg.value instruction gets a new !dbg metadata as part of an applyMergedLocation call. This is now:

call void @llvm.dbg.value(metadata i64 %vala, metadata !7, metadata !DIExpression()), !dbg !11

!11 = !DILocation(line: 0, scope: !4)

The numbers are off since all the metadata was renamed. But the important change is that this is now no longer an "inline" location, but a location that points directly to some line in "caller". But the variable identified by the llvm.dbg.value of course remains a local variable of the inlined "callee" function.

The module verifier now aborts because a llvm.dbg.value statement for a variable in "callee" has a location in "caller".

Reading the code in applyMergedLocation / getMergedLocation and the comments in https://reviews.llvm.org/D39628 this behavior seems to have been deliberate? But at least for llvm.dbg.value statements this doesn't look appropriate.

I'm not sure exactly how to fix that, so any advice would be appreciated.

adrian-prantl commented 6 years ago

Thanks for your perseverance!

uweigand commented 6 years ago

Fixed in r327622.

uweigand commented 6 years ago

If we want to go with the improved general algorithm, my proposed patch for that (including test cases) is now available on Phabricator here: https://reviews.llvm.org/D43687

uweigand commented 6 years ago

So I guess we need to decide: do you want special treatment of debug info intrinsics, or should the general algorithm be such that debug info intrinsics work automatically?

In the first case, I'd suggest committing something like my "Alternative fix." now so the immediate bug is resolved. The general algorithm can then be improved later in whatever way we want.

In the second case, we need to agree that the general algorithm needs to have the required property so that debug info intrinsics work. If we can agree on that, I can work on such an implementation as well ...

Which way should I proceed?

adrian-prantl commented 6 years ago

Right, I forgot about the debug intrinsics again. When mergeDebugLocations gets called on two llvm.dbg intrinsics, and they do not share the same scope and inlinedAt field, then they describe two different variables (or different inlined instances of the same abstract variable) and therefor should both be kept. If they only differ in source line, like in the example you posted earlier then it is fine to merge them because no information is lost.

uweigand commented 6 years ago

The approach in comment #26 would not solve the original problem, though. This was about the requirement that debug intrinsics must carry a location refering to the same scope the variable tracked by the debug intrinsic is defined in.

Adrian's original suggestion in comment #18 does solve the original problem, since it has the property that if the two locations to be merged have the same scope (which will be true for two debug intrinsics tracking the same variable), the resulting merged location is guaranteed to also have that same scope.

The new approach does not have that property, however. If we go with that, we're back to needing special handling for debug intrinsics along the lines of one of first first two proposed patches.

dwblaikie commented 6 years ago

... I would've thought this should be handled by treating the scope as {scope, inlinedAt} pair

I think that would be sufficient, yes. Realistically, if you have both a scope and an inlinedAt mismatch you might as well give up and generate an artificial location right away.

I'm not sure I follow why that would be?

I would guess this would happen if you were merging instructions from two inlined calls:

return x(y());

they will have different scopes and inlinedAts (To ensure separate inlinings are tracked correctly, the DILocation used by an inlinedAt is marked 'distinct' - so even if you don't have column info on, the instruction from x will have a different inlinedAt to the instruction from y).

adrian-prantl commented 6 years ago

... I would've thought this should be handled by treating the scope as {scope, inlinedAt} pair

I think that would be sufficient, yes. Realistically, if you have both a scope and an inlinedAt mismatch you might as well give up and generate an artificial location right away.

dwblaikie commented 6 years ago

I'm not sure I'm totally following this, but I would've thought this should be handled by treating the scope as {scope, inlinedAt} pair. Looking for the nearest common parent with the same {scope, inlinedAt} pair? (walking scope parents until there's no more, then jumping the inlinedAt edge, then walking scope parents again, etc, until you run out of both)

adrian-prantl commented 6 years ago

Thanks! When you are preparing the patch, could you perhaps add a unit test that illustrates a couple of examples of how the algorithm works? (See unittests/IR?MetadataTest for some examples on how to cons up DILocations).

uweigand commented 6 years ago

Hmmm.. looking at the patch:

// Find closest common scope for LocA and LocB.

DILocalScope *CommonScope = nullptr;

SmallPtrSet<DILocalScope *, 5> ScopesA;

for (const DILocation *L = LocA; L; L = L->getInlinedAt())

Shouldn't this be for (const DIScope *S = L->getScope(); L; L = L->getParent())

Ah, I misunderstood what you meant, then! Yes, if we do it that way, then the scope and inline information can be handled completely independently.

I'll update the patch and move the discussion to Phabricator ...

dwblaikie commented 6 years ago

Whenever you send this out - I'd really love it if you could write up a summary in the patch description (easier to read than metadata test cases unfortunately) of all the cases you've considered & the behavior under those conditions, etc.

adrian-prantl commented 6 years ago

Thanks!

A "current function scope" isn't really known here, unless I'm missing something. I've been using LocA->getInlinedAtScope() as the "fallback" scope -- not sure what the best thing is if this differs from LocB->getInlinedAtScope().

That walks all the way back into the current function, that's fine.

After finding the common scope, I'm only searching for a common "inlined-at" location starting from the location in the inline chains for both LocA and LocB that use this common scope -- it probably would be confusing to use any deeper chains here, or we may end up with a scope "inlined at" some callee ...

It would be nice if

merge(!DILocation(line: 3, scope: !1, inlinedAt: !2), !DILocation(line: 4, scope: !DIScope(scope: !1), inlinedAt: !2)

would result in

!DILocation(line: 0, scope: !1, inlinedAt: !2))

without loosing the inlinedAt location.

Hmmm.. looking at the patch:

// Find closest common scope for LocA and LocB.
DILocalScope *CommonScope = nullptr;
SmallPtrSet<DILocalScope *, 5> ScopesA;
for (const DILocation *L = LocA; L; L = L->getInlinedAt())

Shouldn't this be for (const DIScope *S = L->getScope(); L; L = L->getParent())

ScopesA.insert(L->getScope());
for (const DILocation *L = LocB; L; L = L->getInlinedAt()) same here
if (ScopesA.count(L->getScope())) {
CommonScope = L->getScope(); break;
}

There may also be the case that the common scope is already equal to the current function scope. In this case, I think we should let "inlined-at" legitimately remain nullptr, and not make up a dummy location for it.

If the common scope is the current function scope and one or both locations have an inlinedAt field then the function is recursive. I agree that in this case the inlinedAt field isn't needed to appease the Verifier and it also doesn't add any useful information to the debug info.

vedantk commented 6 years ago

Could you move this to reviews.llvm.org to make it easier to review?

uweigand commented 6 years ago

Improved algorithm

uweigand commented 6 years ago

OK, I've attempted to implement an improved algorithm along those lines. A couple of points where I had to make additional assumptions:

A "current function scope" isn't really known here, unless I'm missing something. I've been using LocA->getInlinedAtScope() as the "fallback" scope -- not sure what the best thing is if this differs from LocB->getInlinedAtScope().
After finding the common scope, I'm only searching for a common "inlined-at" location starting from the location in the inline chains for both LocA and LocB that use this common scope -- it probably would be confusing to use any deeper chains here, or we may end up with a scope "inlined at" some callee ...
There may also be the case that the common scope is already equal to the current function scope. In this case, I think we should let "inlined-at" legitimately remain nullptr, and not make up a dummy location for it.

This doesn't have any test regressions, and still fixes my test case.

adrian-prantl commented 6 years ago

if we're merging two copies of the same instruction from a function that was inlined twice at different lines in the same caller, right?

That's a good point.

I think the improved algorithm should search for both a common scope and a common inlinedAt location:

if (findClosestCommonScope(A, B)) Result.scope = findClosestCommonScope(A, B); else { // No common scope exists, make up a new compiler-generated location. // At this point the inlinedAt field will be meaningless as well. Result = DILocation(line: 0, scope: CurFnScope, inlinedAt: null); return Result; }

if (findClosestCommonInlinedAt(A, B)) Result.inlinedAt = findClosestCommonInlinedAt(A, B); else { // No common inlinedAt exists, mark as inlined at a // compiler-generated location in the current function. Result.inlinedAt = DILocation(0, CurFnScope, nullptr); }

uweigand commented 6 years ago

Looking at the code, I think that getMergedLocation() is implemented incorrectly. It looks like it always "unrolls" one level of inlining; which it shouldn't. It should be doing this operation on the DIScopes, not on the InlinedAt locations.

Interesting. I'm wondering: if we do that to find the merged scope, what should the "InlinedAt" field of the resulting merged location be set to, in case the InlinedAt fields of the two locations with that scope in the chains for LocA and LocB respectively differ?

This doesn't happen in my test case, but I guess it could happen if we're merging two copies of the same instruction from a function that was inlined twice at different lines in the same caller, right?

adrian-prantl commented 6 years ago

Looking at the code, I think that getMergedLocation() is implemented incorrectly. It looks like it always "unrolls" one level of inlining; which it shouldn't. It should be doing this operation on the DIScopes, not on the InlinedAt locations.

adrian-prantl commented 6 years ago

Since the line number is going to be ignored by the backend anyway, using line 0 is fine. Thanks for bearing with me.

I'm just confused why DILocation::getMergedLocation() doesn't behave correctly in this case already. I thought it was supposed to pick the first common parent scope of both instructions:

/// different files/lines the location is ambiguous and can't be /// represented in a single line entry. In this case, no location /// should be set, unless the merged instruction is a call, which we will /// set the merged debug location as line 0 of the nearest common scope /// where 2 locations are inlined from. This only applies to Instruction;

Does it not recognize the dbg.value as a call, is there a bug in getMergedLocation or am I misunderstanding something?

uweigand commented 6 years ago

Alternative fix

uweigand commented 6 years ago

I fully agree with you. I the arguments are identical and the only difference is the line number, pick one arbitrarily is the way to go.

OK, agreed. This is basically exactly what my original patch does, except it sets the line number to 0 instead of picking one arbitrarily.

So we could just go with that patch. In the alternative, a simple way to implement what you suggest (just pick one line number) would be to simply not call applyMergedLocation if the instruction is a dbg.value. Then we'd just keep the line number of the instruction that was moved. I'll attach an alternative patch implementing this version shortly.

Which way do you think would be better?

adrian-prantl commented 6 years ago

At this point, the dbg.value statements are only even attempted to merge if their arguments are identical. So I don't see when it would make really sense to keep both.

I fully agree with you. I the arguments are identical and the only difference is the line number, pick one arbitrarily is the way to go.

uweigand commented 6 years ago

I just had another look at the testcase: I'm assuming that either %vala or %valb will be eliminated either by this pass or soon thereafter. We probably want to keep both dbg.values so that the one describing the value that isn't eliminated survives.

Well, yes, this is exactly what this pass does. In fact, at the time we even look at the dbg.value statements, they already refer both to %vala.

The HoistThenElseCodeToIf logic goes through the then/else block sequentially from the start. We start out with:

init: %v9 = icmp eq i64 %flag, 0 br i1 %v9, label %a, label %b

a: ; preds = %init %vala = load i64, i64* %ptr, align 8 call void @llvm.dbg.value(metadata i64 %vala, metadata !8, metadata !DIExpression()), !dbg !12 br label %test.exit

b: ; preds = %init %valb = load i64, i64* %ptr, align 8 call void @llvm.dbg.value(metadata i64 %valb, metadata !8, metadata !DIExpression()), !dbg !13 %valbmasked = and i64 %valb, 1 br label %test.exit

The HoistThenElseCodeToIf routine then moves the %vala load up to the init block, deletes the %valb load, and changes all remaining users of %valb to use %vala instead:

init: %v9 = icmp eq i64 %flag, 0 %vala = load i64, i64* %ptr, align 8 br i1 %v9, label %a, label %b

a: ; preds = %init call void @llvm.dbg.value(metadata i64 %vala, metadata !8, metadata !DIExpression()), !dbg !12 br label %test.exit

b: ; preds = %init call void @llvm.dbg.value(metadata i64 %vala, metadata !8, metadata !DIExpression()), !dbg !13 %valbmasked = and i64 %vala, 1 br label %test.exit

Now we look at the (identical except for the !dbg) dbg.value statements, and the code again moves the first one up to the init block and deletes the second one. The only problem is that it comes up with an invalid !dbg during the move ...

At this point, the dbg.value statements are only even attempted to merge if their arguments are identical. So I don't see when it would make really sense to keep both.

adrian-prantl commented 6 years ago

You are correct that in this case there is no benefit in keeping both dbg.values since they describe the same variable. The line number of the !dbg location is effectively unused, but as you noticed the scope and the inlinedAt field are important.

I just had another look at the testcase: I'm assuming that either %vala or %valb will be eliminated either by this pass or soon thereafter. We probably want to keep both dbg.values so that the one describing the value that isn't eliminated survives.

uweigand commented 6 years ago

Thanks for your patch. I think that the correct fix is to either drop the dbg.value or — perhaps even better — keep both dbg.values in the code that invokes the mergeDebugLocation.

As far as I can tell, there is just the one caller of applyMergedLocation where the instruction can possibly be a dbg.value (HoistThenElseCodeToIf), so I guess it should be fine to handle this case in that caller.

Just dropping the dbg.value there looks straightforward, the code is already dropping dbg.value statements for other reasons (e.g. because they don't match up with another dbg.value on the other side). But the dbg.value does seem to provide information that would be needlessly lost ...

I'm wondering what the point of keeping both dbg.values would be; they already must agree in both the variable they set and the value they set it to, so they'd only differ in the line number. Would later LLVM passes actually be able to do anything useful with two consecutive dbg.value statements that agree in everything but the line number (that they couldn't do if we had just one dbg.value with zero line number)?

In any case, if we wanted to keep both, we'd have to change HoistThenElseCodeToIf to -just for this case- move both statements from their respective then/else block to the preceding if block, instead of moving one and dropping the other. Is this what you are suggesting?

adrian-prantl commented 6 years ago

Thanks for your patch. I think that the correct fix is to either drop the dbg.value or — perhaps even better — keep both dbg.values in the code that invokes the mergeDebugLocation.

uweigand commented 6 years ago

Proposed fix

uweigand commented 6 years ago

So before rev. 314694, the compiler would simply leave the location of one of the two merged llvm.dbg.value statements unchanged. That commit changed that with the argument that since we've merged two statement originating from two different source locations, using just one of those is wrong and gives a bad debugging experience.

Instead, after the above revision, the compiler now creates a new location with "line 0", which is to be interpreted as "unknown location". But that location still has scope information. For some reason, the scope is set to where the merged statements are inlined into (if the original locations had inline information).

Now, it appears to me that in the case of llvm.dbg.value statements, it still makes sense to set the merged statement to a "line 0" unknown location. However, the scope information really needs to be set to the scope where the variable is defined. And given that we started out with two llvm.dbg.value statements for the same variable, they both must have had that same location originally anyway. Therefore it seems to me the easiest fix might be to special-case llvm.dbg.value statements and create a new location with "line 0" but the same scope as before.

Does this make sense? I've got a patch that does that, and it seems to fix the problem for me ...

uweigand commented 6 years ago

Reading through r317524 again, I can't find anything that's not NFC, but I could have missed something! Were you able to identify it as the problem commit via bisection?

No, this commit itself isn't the problem, it was just the latest change to the function. I was just hoping for some background of why the logic is the way it is (both before and after this particular change) ...

The actual commit that first makes this test case fail is rev. 314694, which was discussed as https://reviews.llvm.org/D37877

uweigand commented 6 years ago

Reading through r317524 again, I can't find anything that's not NFC, but I could have missed something! Were you able to identify it as the problem commit via bisection?

No, this commit itself isn't the problem, it was just the latest change to the function. I was just hoping for some background of why the logic is the way it is (both before and after this particular change) ...

uweigand commented 6 years ago

Pre-inline test case - run with "opt -inline -simplifycfg"

vedantk commented 6 years ago

Reading through r317524 again, I can't find anything that's not NFC, but I could have missed something! Were you able to identify it as the problem commit via bisection?

vedantk commented 6 years ago

Could you share the pre-inlining test case which reproduces with opt -inline -simplifycfg?

Relatedly, if reverting rL317524 fixes the bug, let's revert right away as the change was supposed to be NFC.

llvm / llvm-project

Merging inlined llvm.dbg.value creates broken metadata #35758

Extended Description