microsoft / microsoft-pdb

Information from Microsoft about the PDB format. We'll try to keep this up to date. Just trying to help the CLANG/LLVM community get onto Windows.
Other
1.87k stars 273 forks source link

Global Symbols and their segs #19

Open 8thMage opened 8 years ago

8thMage commented 8 years ago

Hello, in the struct: typedef struct DATASYM32 { unsigned short reclen; // Record length unsigned short rectyp; // S_LDATA32, S_GDATA32, S_LMANDATA, S_GMANDATA CV_typ_t typind; // Type index, or Metadata token if a managed symbol CV_uoff32_t off; unsigned short seg; unsigned char name[1]; // Length-prefixed name } DATASYM32;

that is used for global data, there is that unsigned short seg. from looking at it and the pe, it looks like (seg-1) is the pe section that one need to offset off by to look at that global. The question arises, what about seg=0? it looks like it's used for numerous things, including __ImageBase with off=0, and it looks like it's just offseted by the virtual address of that pe.

My questions are:

  1. is my thinking is correct that the seg-1 is the section to offset by to look for the global?
  2. is seg==0 is to offset relative to the virtual address of the pe?
  3. why are there numerous globals with seg=0 and off=0 that are not recognized when typed into visual studio? things like __arct_country_count and _wpgmptr?

Thanks, The 8th mage

8thMage commented 8 years ago

This issue is still open, although i think i wrote the question in a organized way. if you don't understand them, please write back.

gwicksted commented 7 years ago

@8thMage you're not the only one! I ran into this as well. In fact, thanks to @skochinsky (below) it is not limited to DATASYM32.

Disclaimer: my experience is limited to a personal project which is clean-room x86 disassembler, PE/COFF loader, C/C++ demangler, and PDB reader in 100% managed C# code so it is likely to have different errors.

I can say with confidence that it exists this way in the PDB file itself & not as a software bug after-the-fact.

As you stated, the 0 seg index does appear to correspond to the .text section (which is usually both the 0th section descriptor and begins at the base image address).

Edit: terminology and erroneous assumptions corrected

skochinsky commented 7 years ago

From the "VC5.0 Symbolic Debug Information" document (emphasis mine):

Logical segments

When the linker emits address information about a symbol, it is done in a segment:offset format. The segment is a logical segment index assigned by the linker and the offset is the offset from the beginning of the logical segment. The physical address is assigned by the operating system when the program is loaded.

For PE formatted executables, the segment field is interpreted as the PE section number.

gwicksted commented 7 years ago

@skochinsky Thank you for confirming this! I can confirm this is the case with my test data. So:

int section = symbol.segment > 0 ? symbol.segment - 1 : 0;

It appears to be accurate thus far.

madcodescience commented 6 years ago

When you look at the segments created by the Windows PE loader, the first segment contains the MZ/PE Header and lies at ImageBase. That would make the ".text" section actually the second segment (usually at ImageBase + 0x1000).