Closed npetroni closed 7 years ago
Just pushed a commit that should fix this, could you try it on your PDB collection? Seemed to work with the ntkrnlmp.pdb I had sitting around.
Thanks! It looks like this version parses, but all of the unnamed tags reduce to '<unnamed-tag>'
. Hopefully that's an easy fix. Then I'd be happy to run it across my collection.
OK, I've fixed up the unnamed bit. More bytes/string madness :(
Cool, getting closer :). Looks like enum keys for some numbers are off (b'\xff'
vs -1
).
Example (old on top, new on bottom):
< 'InterfaceType' : [ 0x12c, ['Enumeration', dict(target = 'long', choices = {0: 'Internal', 1: 'Isa', 2: 'Eisa', 3: 'MicroChannel', 4: 'TurboChannel', 5: 'PCIBus', 6: 'VMEBus', 7: 'NuBus', 8: 'PCMCIABus', 9: 'CBus', 10: 'MPIBus', 11: 'MPSABus', 12: 'ProcessorInternal', 13: 'InternalPowerBus', 14: 'PNPISABus', 15: 'PNPBus', 16: 'Vmcs', 17: 'ACPIBus', 18: 'MaximumInterfaceType', -1: 'InterfaceTypeUndefined'})]],
---
> 'InterfaceType' : [ 0x12c, ['Enumeration', dict(target = 'long', choices = {0: 'Internal', 1: 'Isa', 2: 'Eisa', 3: 'MicroChannel', 4: 'TurboChannel', 5: 'PCIBus', 6: 'VMEBus', 7: 'NuBus', 8: 'PCMCIABus', 9: 'CBus', 10: 'MPIBus', 11: 'MPSABus', 12: 'ProcessorInternal', 13: 'InternalPowerBus', 14: 'PNPISABus', 15: 'PNPBus', 16: 'Vmcs', 17: 'ACPIBus', 18: 'MaximumInterfaceType', b'\xff': 'InterfaceTypeUndefined'})]],
Interestingly, it looks like there was something similarly odd even in the old version (old on top, new on bottom):
< 'State' : [ 0x20, ['Enumeration', dict(target = 'long', choices = {0: 'LdrModulesPlaceHolder', 1: 'LdrModulesMapping', 2: 'LdrModulesMapped', 3: 'LdrModulesWaitingForDependencies', 4: 'LdrModulesSnapping', 5: 'LdrModulesSnapped', 6: 'LdrModulesCondensed', 7: 'LdrModulesReadyToInit', 8: 'LdrModulesInitializing', 9: 'LdrModulesReadyToRun', '\xfb': 'LdrModulesMerged', '\xfd': 'LdrModulesSnapError', '\xfc': 'LdrModulesInitError', -1: 'LdrModulesUnloading', '\xfe': 'LdrModulesUnloaded'})]],
---
> 'State' : [ 0x20, ['Enumeration', dict(target = 'long', choices = {0: 'LdrModulesPlaceHolder', b'\xfc': 'LdrModulesInitError', 2: 'LdrModulesMapped', 3: 'LdrModulesWaitingForDependencies', 4: 'LdrModulesSnapping', 5: 'LdrModulesSnapped', 6: 'LdrModulesCondensed', 1: 'LdrModulesMapping', 8: 'LdrModulesInitializing', 9: 'LdrModulesReadyToRun', b'\xfe': 'LdrModulesUnloaded', b'\xff': 'LdrModulesUnloading', b'\xfb': 'LdrModulesMerged', 7: 'LdrModulesReadyToInit', b'\xfd': 'LdrModulesSnapError'})]],
Do we expect the keys '\xfd'
through '\xfe'
to be numbers? Several of the mainline Volatility profiles have this exact feature, so it either works fine or is not critical...
I've committed a patch that should fix this. It changes the interpretation of 'char' to be a signed 8-bit integer rather than a string of length 1. That should also cover the \xfd
and \xfe
cases too.
I also noticed and fixed a bug in how array sizes were calculated, which was causing some arrays to claim they had a negative number of elements (you can see these in the Volatility profiles), so you will see a small change to some of the Volatility profiles (for the better!).
Sweet, both fixes look great. Nice catch on the arrays.
I ran it over my corpus and found that some enumerations are missing in the new version. Here's a snippet from an example PDB, ntkrnlmp.pdb/C0F42DF3DB264AC4AD62602E3961A1E72/ntkrnlmp.pdb
(old on top, new on bottom):
< 'InterfaceType' : [ 0xdc, ['Enumeration', dict(target = 'long', choices = {0: 'Internal', 1: 'Isa', 2: 'Eisa', 3: 'MicroChannel', 4: 'TurboChannel', 5: 'PCIBus', 6: 'VMEBus', 7: 'NuBus', 8: 'PCMCIABus', 9: 'CBus', 10: 'MPIBus', 11: 'MPSABus', 12: 'ProcessorInternal', 13: 'InternalPowerBus', 14: 'PNPISABus', 15: 'PNPBus', 16: 'Vmcs', 17: 'MaximumInterfaceType', -1: 'InterfaceTypeUndefined'})]],
---
> 'InterfaceType' : [ 0xdc, ['Enumeration', dict(target = 'long', choices = {})]],
2582c2582
< 'ChildInterfaceType' : [ 0xe4, ['Enumeration', dict(target = 'long', choices = {0: 'Internal', 1: 'Isa', 2: 'Eisa', 3: 'MicroChannel', 4: 'TurboChannel', 5: 'PCIBus', 6: 'VMEBus', 7: 'NuBus', 8: 'PCMCIABus', 9: 'CBus', 10: 'MPIBus', 11: 'MPSABus', 12: 'ProcessorInternal', 13: 'InternalPowerBus', 14: 'PNPISABus', 15: 'PNPBus', 16: 'Vmcs', 17: 'MaximumInterfaceType', -1: 'InterfaceTypeUndefined'})]],
---
> 'ChildInterfaceType' : [ 0xe4, ['Enumeration', dict(target = 'long', choices = {})]],
2616c2616
< 'AllocationType' : [ 0x8, ['Enumeration', dict(target = 'long', choices = {0: 'ArbiterRequestLegacyReported', 1: 'ArbiterRequestHalReported', 2: 'ArbiterRequestLegacyAssigned', 3: 'ArbiterRequestPnpDetected', 4: 'ArbiterRequestPnpEnumerated', -1: 'ArbiterRequestUndefined'})]],
---
> 'AllocationType' : [ 0x8, ['Enumeration', dict(target = 'long', choices = {})]],
2627c2627
Everything else looks good so far but I'll do a more thorough check tomorrow.
Well this looks like an exciting bug (or I'm doing something really stupid). It seems like there's something nondeterministic going on. Over 10 runs it seems to succeed around half the time:
for i in {1..10}; do
pdb_tpi_vtypes.py ntkrnlmp.pdb | grep ArbiterRequestLegacyReported >/dev/null && echo "Found" || echo "Not found"
done
Produces:
Found
Found
Found
Found
Found
Not found
Not found
Not found
Found
Not found
All I can think of is that there might be some code in there that implicitly depends on dictionary ordering, which seems to be random as of Python3. But I don't know immediately where that would be happening; I'll have to take a closer look tomorrow.
The issue is how forward references are merged.
When it's decoded "correctly" :
'_PNP_RESOURCE_REQUEST' : [ 0x28, {
Container:
leaf_type = 'LF_MEMBER'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
index = Container:
length = 38
tpi_idx = 5647
leaf_type = 'LF_ENUM'
count = 6
prop = Container:
fwdref = False
opcast = False
opassign = False
cnested = False
isnested = False
ovlops = False
ctor = False
packed = False
reserved = 0
scoped = False
utype = 'T_INT4'
fieldlist = Container:
length = 206
tpi_idx = 5646
leaf_type = 'LF_FIELDLIST'
substructs = [
Container:
leaf_type = 'LF_ENUMERATE'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
name = 'ArbiterRequestUndefined'
enum_value = -1
Container:
leaf_type = 'LF_ENUMERATE'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
name = 'ArbiterRequestLegacyReported'
enum_value = 0
Container:
leaf_type = 'LF_ENUMERATE'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
name = 'ArbiterRequestHalReported'
enum_value = 1
Container:
leaf_type = 'LF_ENUMERATE'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
name = 'ArbiterRequestLegacyAssigned'
enum_value = 2
Container:
leaf_type = 'LF_ENUMERATE'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
name = 'ArbiterRequestPnpDetected'
enum_value = 3
Container:
leaf_type = 'LF_ENUMERATE'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
name = 'ArbiterRequestPnpEnumerated'
enum_value = 4
]
name = '_ARBITER_REQUEST_SOURCE'
name = 'AllocationType'
offset = 8
'AllocationType' : [ 0x8, ['Enumeration', dict(target = 'long', choices = {0: 'ArbiterRequestLegacyReported', 1: 'ArbiterRequestHalReported', 2: 'ArbiterRequestLegacyAssigned', 3: 'ArbiterRequestPnpDetected', 4: 'ArbiterRequestPnpEnumerated', -1: 'ArbiterRequestUndefined'})]],
and "incorrectly" :
'_PNP_RESOURCE_REQUEST' : [ 0x28, {
Container:
leaf_type = 'LF_MEMBER'
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
index = Container:
length = 38
tpi_idx = 5647
leaf_type = 'LF_ENUM'
count = 6
prop = Container:
fwdref = False
opcast = False
opassign = False
cnested = False
isnested = False
ovlops = False
ctor = False
packed = False
reserved = 0
scoped = False
utype = 'T_INT4'
fieldlist = Container:
length = 206
tpi_idx = 5646
leaf_type = 'LF_FIELDLIST'
substructs = []
name = '_ARBITER_REQUEST_SOURCE'
name = 'AllocationType'
offset = 8
'AllocationType' : [ 0x8, ['Enumeration', dict(target = 'long', choices = {})]],
The only difference in the metadata is the tpi_index
: 5647
(bad) or 5646
(good).
Microsoft cvdump
return the following informations :
cvdump.exe -t F:\Symbols\ntkrnlmp.pdb\C0F42DF3DB264AC4AD62602E3961A1E72\ntkrnlmp.pdb | Select-String _PNP_RESOURCE_REQUEST -Context 5
Element type : 0x1607
0x160c : Length = 42, Leaf = 0x1505 LF_STRUCTURE
# members = 0, field list type 0x0000, FORWARD REF,
Derivation list type 0x0000, VT shape type 0x0000
> Size = 0, class name = _PNP_RESOURCE_REQUEST, UDT(0x00001611)
0x160d : Length = 10, Leaf = 0x1002 LF_POINTER
Pointer (NEAR32), Size: 0
Element type : 0x160C
member name = 'Status'
0x1611 : Length = 42, Leaf = 0x1505 LF_STRUCTURE
# members = 10, field list type 0x1610,
Derivation list type 0x0000, VT shape type 0x0000
> Size = 40, class name = _PNP_RESOURCE_REQUEST, UDT(0x00001611)
0x1612 : Length = 38, Leaf = 0x1505 LF_STRUCTURE
# members = 0, field list type 0x0000, FORWARD REF,
Derivation list type 0x0000, VT shape type 0x0000
Size = 0, class name = _IO_RESOURCE_LIST, UDT(0x00001d56)
I'm pretty sure sometimes we return the forward reference instead of the real structure. And there is probably a off-by-one error somwhere since (0x1610 == 5648
, 0x1611 == 5649
).
Ok scratch what I've just said : this is a construct
bug (or the way we use it) and the way we handle the -1
enum value.
Minimal regress test :
enum_list = b'\x02\x15\x03\x00\x00\x80\xffArbiterRequestUndefined\x00\xf1'
print(lfFieldList.parse(enum_list))
(lin_env_3) lucasg@DESKTOP-822EGME:/mnt/f/Dev/pdbparse$ python pdbparse/tpi.py
Container:
substructs = []
(lin_env_3) lucasg@DESKTOP-822EGME:/mnt/f/Dev/pdbparse$ python pdbparse/tpi.py
Container:
substructs = [
Container:
leaf_type = 'LF_ENUMERATE'
substructs_data_ptr = 2
test = None
type_info = Container:
fldattr = Container:
noconstruct = False
noinherit = False
pseudo = False
mprop = 'MTvanilla'
access = 'public'
compgenx = False
enum_value_data_ptr = 4
test = None
value = Container:
value_or_type = 32768
name_or_val = Container:
value = -1
name = 'ArbiterRequestUndefined'
]
The result is not deterministic and coherent with @moyix observations.
Okay found the culprit :
LF_NUMERIC = 0x8000,
LF_CHAR = 0x8000,
LF_SHORT = 0x8001,
LF_USHORT = 0x8002,
I copied the leaf types values from microsoft pdb cvinfo.h
, without realising there were duplicates for macros uses. If it's not an issue in C, it is one for construct.Enum
. Since the val
switch rely on 4 supported values (LF_CHAR
, LF_SHORT
, LF_USHORT
, LF_LONG
,LF_ULONG
) sometimes, based on the dict ordering, it would select LF_NUMERIC
instead of LF_CHAR
and return an empty enum list since the parsing fail (no default parser defined).
Here the diff that solve the issue :
diff --git a/pdbparse/tpi.py b/pdbparse/tpi.py index b3d32d5..d4f92a1 100755
--- a/pdbparse/tpi.py
+++ b/pdbparse/tpi.py
@@ -617,10 +617,10 @@ leaf_type = Enum(ULInt16("leaf_type"),
LF_MATRIX = 0x151c,
LF_VFTABLE = 0x151d, # a virtual function table
- LF_ENDOFLEAFRECORD = 0x151d,
+ # LF_ENDOFLEAFRECORD = 0x151d,
LF_TYPE_LAST = 0x151d + 1, # one greater than the last type record
- LF_TYPE_MAX = (0x151d + 1) - 1,
+ # LF_TYPE_MAX = (LF_TYPE_LAST) - 1,
LF_FUNC_ID = 0x1601, # global func ID
LF_MFUNC_ID = 0x1602, # member func ID
@@ -635,9 +635,9 @@ leaf_type = Enum(ULInt16("leaf_type"),
# only generated by linker
LF_ID_LAST = 0x1607 + 1, # one greater than the last ID record
- LF_ID_MAX = (0x1607 - 1) - 1,
+ # LF_ID_MAX = (LF_ID_MAX) - 1,
- LF_NUMERIC = 0x8000,
+ # LF_NUMERIC = 0x8000,
LF_CHAR = 0x8000,
LF_SHORT = 0x8001,
LF_USHORT = 0x8002,
Since I had to heavily operate on my local version of pdbparse
in order to track and debug this issue, @moyix I would appreciate if you could commit this fix from your side.
L.
That fix looks good! I don't believe anything is using LF_ENDOFLEAFRECORD
, LF_TYPE_MAX
, LF_ID_MAX
, or LF_NUMERIC
, so this should be okay. The first three are presumably just used for array sizes in C, and the last one is presumably used as a bitmask to check if something is a numeric type. So there might be an argument for keeping LF_NUMERIC
around as a separate global constant somewhere, but unless someone complains let's leave it out for now.
Thanks for tracking this down!
@npetroni, hopefully this is the last of the compatibility issues affecting Volatility profiles :)
Based on my testing, everything looks good as compared to the old version. Thanks for the quick turnaround on these. I think we can close this issue.
I did turn up a few problems that exist in both the old and new versions, but they aren't impacting anyone AFAIK. They only manifest themselves on a small number of the PDBs that I have, none of which are important to me at the moment.
Great! Thanks for all your help testing.
examples/pdb_tpi_vtypes.py
produces output that can't be consumed by Python. Partial output: