llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.87k stars 11.92k forks source link

Output S_OBJNAME and S_COMPILE3 symbols #33608

Open llvmbot opened 7 years ago

llvmbot commented 7 years ago
Bugzilla Link 34260
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@rnk

Extended Description

MSVC outputs an S_OBJNAME and S_COMPILE3 CodeView record into each module's .debug$S section. clang-cl is not outputting these, which could have unknown effects on the linker. We should output these.

llvmbot commented 7 years ago

Definitely agree there. It's why I've spent so much time already on llvm-pdbutil, and every time I do encounter a bug, while using the tools to find / fix it, I'm also simultaneously working on improving the tool to help me find / fix this particular bug faster, in hopes that these same improvements will save me time down the road.

So I think we are in agreement on the need for good tooling.

dwblaikie commented 7 years ago

The point is though, that I would rather spend 10 minutes just implementing it than 10 days trying to track down a bug caused as a result of not implementing it.

My perspective here is that I'd like to see the debugging tools & techniques improved so it doesn't take 10 days. Because there will be these bugs (perhaps less of them) even if we do take the conservative "try to emit as many things like MSVC as possible" so having efficient ways of investigating them (by hacking up output to match MSVC as much as possible on an as-needed basis, for example) in any case.

llvmbot commented 7 years ago

unless it's extremely difficult / cumbersome to do so.

(Or unless we're gaining something tangible by not doing so). An example of this is a few weeks ago when I added the DataCRC fields to the linker module symbols. We don't know what this does. Maybe it's related to source server or symbol servers. Maybe it's related to source file mismatch detection. Or it could be related to Edit & Continue or some other thing that we don't support. The point is though, that I would rather spend 10 minutes just implementing it than 10 days trying to track down a bug caused as a result of not implementing it.

llvmbot commented 7 years ago

It's pretty easy to hack LLVM to emit canned codeview records and experiment with them to see if adding these records actually does anything.

That's where I disagree. It's easy to see if adding them does anything obvious. But the obvious things are not the ones I'm concerned about. I'm concerned about the ones like we had a few weeks ago where whether or not local variables work depends on a version number being Microsoft-specific.

Given what I've learned from that episode, I'm not really prepared to make any assumptions about the side effects of having or not having certain things. For all I know not having the symbol will break attach to process. Or ETW traces will have lower fidelity. Or WoW64 debugging. Or any other number of things which we can't even imagine right now because it seems completely unrelated. And we can save ourselves a lot of headache and eliminate a lot of variables if we just match their output unless it's extremely difficult / cumbersome to do so.

rnk commented 7 years ago

I think this is kind of the nature of the beast we're dealing with, and we will save a lot of headaches long term by not trying too hard to fight it

I don't agree, it's important to find a reason to bother emitting S_OBJNAME etc. It's pretty easy to hack LLVM to emit canned codeview records and experiment with them to see if adding these records actually does anything. If they don't do anything, there's no reason to burden the rest of LLVM and clang with passing down the object file name, the compile command line, or whatever these obscure records need.

For example, MSVC emits LF_VFTABLE records describing vtables. Clang doesn't, because it's burdensome to implement, and it only exists to power LTCG, which clang implements differently.

llvmbot commented 7 years ago

For the OBJNAME symbol and how it interoperates with LTO, we will need to run MSVC's linker with LTO and see what the output PDBs look like. I imagine that on the compiler side there's no difference. It probably still outputs the S_OBJNAME symbol into each compiland. But it's not clear what the linker does with them.

llvmbot commented 7 years ago

I wasn't so much thinking of that kind of testing - but "I have mysterious MSVC debugger behaving weirdly with Clang compared to MSVC & how do I isolate the problem", which seemed to be the scenario you were describing.

So more thinking along the lines of how two quickly narrow down the significant different in a relatively reliable way. I imagine this will be necessary/useful even if LLVM conservatively tries to emit MSVC-like output in places of uncertainty.

llvm-pdbutil has a "diff" mode where it can take two PDBs and output the differences. I've been using that for some of this kind of testing, but things get more difficult when you start looking at differences between symbol and type records, because there are multiple orderings that are equivalent, so it's kind of a graph isomorphism problem. I plan to continue improving it, but all the easy work is done there, so now it's to the hard stuff.

In those cases I'm especially interested in not producing things we don't know why they're needed - I think it's really valuable/important to know why certain bits are being produced or we'll end up producing all sorts of oddities without any documentation as to why they're needed, how to approach removing/modifying them, etc.

It's definitely important, but there's only so much we can do. We're limited by the fact that the format is not well documented, and if there's something that needs to be set, then all we can say is that it needs to be set. If we took the approach you propose then we wouldn't even have working PDBs today because there's still plenty of stuff in the PDB that we're writing but don't know what it's for or how it's used. And I don't think we ever will know. In a lot of cases I imagine the answer is "technically the debugger doesn't actually need that piece of information, but the current implementation assumes it's there, so...."

I think this is kind of the nature of the beast we're dealing with, and we will save a lot of headaches long term by not trying too hard to fight it

dwblaikie commented 7 years ago

There's a combination of things we can do.

We've discussed enhancing debginfo-tests to script Microsoft command line debugger (cdb). That will go a long way, but it will take some work to set up.

I wasn't so much thinking of that kind of testing - but "I have mysterious MSVC debugger behaving weirdly with Clang compared to MSVC & how do I isolate the problem", which seemed to be the scenario you were describing.

So more thinking along the lines of how two quickly narrow down the significant different in a relatively reliable way. I imagine this will be necessary/useful even if LLVM conservatively tries to emit MSVC-like output in places of uncertainty.

I'm more talking about the case where there's something that we don't know what it is and could have surprising and mysterious side effects. In those cases (such as the one described in this bug) I think we should try to stay as close as possible to what MS outputs

In those cases I'm especially interested in not producing things we don't know why they're needed - I think it's really valuable/important to know why certain bits are being produced or we'll end up producing all sorts of oddities without any documentation as to why they're needed, how to approach removing/modifying them, etc.

That's why I'm pushing back a bit here (& I do similarly with other changes to debug info where they're similarly motivated by uncertainty*) - I think it's really valuable/important to err on the side of not behaving in a certain way until there's a use case/motivation that can be documented to explain why it is the way it is.

(see, for example, the discussion around OBJNAME and LTO - what name is useful, and why? Probably not the LTO pseudo-object that contains IR, but the real object that the backend/linker-driven compile produces? But how do we know what to do here if we don't know why we're doing it?)

llvmbot commented 7 years ago

There's a combination of things we can do.

We've discussed enhancing debginfo-tests to script Microsoft command line debugger (cdb). That will go a long way, but it will take some work to set up.

We can also work with MS when we have questions to better understand certain things, which we are actively doing.

I agree that we shouldn't produce bit for bit identical debug info. This isn't really possible anyway just because we generate different code and the debug info has to contain addresses. But I think we should try to match the types of records they output, for the sole reason that their tools are understandably better tested when they get debug info that looks like it could have been generated by their compiler.

Sometimes even that isn't enough though. See bug 34312. That's a case where we did the right thing and got burned for it. Don't get me wrong, I don't think we should stop trying to output higher fidelity debug info, , but I'm more talking about the case where there's something that we don't know what it is and could have surprising and mysterious side effects. In those cases (such as the one described in this bug) I think we should try to stay as close as possible to what MS outputs

dwblaikie commented 7 years ago

Alternatively, bit-for-bit identical output from LLVM for MSVC's debug info doesn't seem like a viable end-goal either (& removes many opportunities for LLVM to do better that MSVC by omitting debug info more aggressively, etc).

Are there ways debugging tools to reduce test cases and isolate differences in output could be improved to help speed up debugging of debug info behavior differences to make divergences less costly?

llvmbot commented 7 years ago

Having to go through this kind of exercise every time we see an obscure bug (which we now know is going to happen)

^ Insert "does not seem like a scalable approach long-term" at the end of this phrase

llvmbot commented 7 years ago

We've encountered real scenarios where only after several weeks of investigation did we find that something was failing because of an obscure field that we didn't think we needed (specifically, a Microsoft-specific version number).

In this particular instance, the symptoms of the problem had nothing to do with a version number and there was no indication that a version number was in any way related.

Having to go through this kind of exercise every time we see an obscure bug (which we now know is going to happen), I think we need to take a different approach and aim for full binary compatibility unless we can demonstrate that it is not needed.

dwblaikie commented 7 years ago

I thought the general approach being taken was only to implement features with demonstrated need? Perhaps I missed some nuance in how feature compatibility/completeness is being approached for clang-cl?

llvmbot commented 7 years ago

s/S_COMPILE3/S_BUILDINFO/