Closed ilch1 closed 4 years ago
I created the following example C structure:
struct outer_struct {
struct {
int field1;
int field2;
struct {
int field3;
int field4;
};
};
int field5;
};
The generated DWARF information (dwarfdump output) is:
0x00000086: DW_TAG_structure_type
DW_AT_name ("outer_struct")
DW_AT_byte_size (0x14)
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (3)
0x0000008e: DW_TAG_member
DW_AT_type (0x00000096 "structure ")
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (4)
DW_AT_data_member_location (0x00)
0x00000096: DW_TAG_structure_type
DW_AT_byte_size (0x10)
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (4)
0x0000009a: DW_TAG_member
DW_AT_name ("field1")
DW_AT_type (0x0000006e "int")
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (5)
DW_AT_data_member_location (0x00)
0x000000a6: DW_TAG_member
DW_AT_name ("field2")
DW_AT_type (0x0000006e "int")
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (6)
DW_AT_data_member_location (0x04)
0x000000b2: DW_TAG_member
DW_AT_type (0x000000ba "structure ")
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (7)
DW_AT_data_member_location (0x08)
0x000000ba: DW_TAG_structure_type
DW_AT_byte_size (0x08)
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (7)
0x000000be: DW_TAG_member
DW_AT_name ("field3")
DW_AT_type (0x0000006e "int")
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (8)
DW_AT_data_member_location (0x00)
0x000000ca: DW_TAG_member
DW_AT_name ("field4")
DW_AT_type (0x0000006e "int")
DW_AT_decl_file ("/Users/ilya/git/github.com/volatilityfoundation/dwarf2json/test/anonymous_types.c")
DW_AT_decl_line (9)
DW_AT_data_member_location (0x04)
The DWARF information is processed iteratively by dwarf2json
. Thus, the definition of anonymous structures may not be known when they are referenced. In the example above, the definition of anonymous structures embedded in outer_struct
have not been processed when first encountered. In order to collapse anonymous structures, the processing would need to be made recursive, which is not trivial. Another option is to make the processing multi-pass, where in the 2nd pass the anonymous structure references are replaced by the flattened instance.
We should discuss if this is better solved on the consumer side. It looks like gdb/lldb solve it that way. The fact that the structure is anonymous (does not have an identifier) could be preserved by dwarf2json
and used by the consumer to correctly expose the fields contained by the anonymous structure.
I'd be ok doing that, but at the moment the JSON has no strictly defined means of indicating whether a member is anonymous or not. We have the DWARF generator producing unnamed_field_<id>
, for pdbconv we hark back to the previous volatility and refer to both anonymous and unnamed (which in the windows world are seemingly different) as __anonymous_<id>
or __unnamed_<id>
respectively. Which means we either need to include an additional field in the schema (entirely doable) or we keep the format as defined and we do the condensing in a second pass (slightly lossy in terms of data). It sounds like adding a field would be useful, but I'm interested why we a) haven't run into this before and b) haven't run into this on windows yet? 5:S
Definitely something we can discuss further at the next meeting...
Can you paste the current dwarf2json output for the above example?
Can you summarize the algorithm(s) used by lldb
and gdb
for this scenario?
Can you paste the current dwarf2json output for the above example?
"outer_struct": { "size": 20, "fields": { "field5": { "type": { "kind": "base", "name": "int" }, "offset": 16 }, "unnamed_field_0": { "type": { "kind": "struct", "name": "unnamed_a35d783f54979948" }, "offset": 0 } }, "kind": "struct" }, "unnamed_a35d783f54979948": { "size": 16, "fields": { "field1": { "type": { "kind": "base", "name": "int" }, "offset": 0 }, "field2": { "type": { "kind": "base", "name": "int" }, "offset": 4 }, "unnamed_field_8": { "type": { "kind": "struct", "name": "unnamed_e43b13834081c6ac" }, "offset": 8 } }, "kind": "struct" }, "unnamed_e43b13834081c6ac": { "size": 8, "fields": { "field3": { "type": { "kind": "base", "name": "int" }, "offset": 0 }, "field4": { "type": { "kind": "base", "name": "int" }, "offset": 4 } }, "kind": "struct" }
Here is an example of the output with anonymous field:
"size": 20,
"fields": {
"field5": {
"type": {
"kind": "base",
"name": "int"
},
"offset": 16
},
"unnamed_field_0": {
"type": {
"kind": "struct",
"name": "unnamed_a35d783f54979948"
},
"offset": 0,
"anonymous": true
}
},
"kind": "struct"
},
"unnamed_a35d783f54979948": {
"size": 16,
"fields": {
"field1": {
"type": {
"kind": "base",
"name": "int"
},
"offset": 0
},
"field2": {
"type": {
"kind": "base",
"name": "int"
},
"offset": 4
},
"unnamed_field_8": {
"type": {
"kind": "struct",
"name": "unnamed_e43b13834081c6ac"
},
"offset": 8,
"anonymous": true
}
},
"kind": "struct"
},
Cool, that looks like what I expected, now we just need to check if I made the schema correctly and if the code to back it up works... 5;) I think it was mm_struct
that was the key example?
The latest commit fixes the metadata to be compatible with schema6.2.0 in issue151-flatten-anonymous
branch of volatility3.
The new metadata for Linux will look like:
"metadata": {
"linux": {
"elf_symbols": true,
"elf_buildid": "130921b08a47907e6701bc7fc1a0253b00aab68b",
"dwarf_symbols": true,
"dwarf_types": true,
"dwarf_buildid": "130921b08a47907e6701bc7fc1a0253b00aab68b"
},
"producer": {
"name": "dwarf2json",
"version": "0.6.0"
},
"format": "6.2.0"
},
The new metadata for Mac will look like:
"metadata": {
"mac": {
"macho_symbols": true,
"macho_uuid": "C8FBE733-0FE1-3C84-AC87-2085A51904EF",
"dwarf_types": true,
"dwarf_symbols": true,
"dwarf_uuid": "C8FBE733-0FE1-3C84-AC87-2085A51904EF"
},
"producer": {
"name": "dwarf2json",
"version": "0.6.0"
},
"format": "6.2.0"
},
Cool, would these be better in a namespace (so something like:
"mac": {
"macho" : {
"symbols": true,
"uuid": "C8FBE733-0FE1-3C84-AC87-2085A51904EF",
}
"dwarf": {
"types": true,
"symbols": true,
"uuid": "C8FBE733-0FE1-3C84-AC87-2085A51904EF"
}
}
and obviously the same for linux?
Also, are there other fields we'd want to add, or are we happy that these are the main ones we'll need (as in, can I block off additional properties to the elf/macho/dwarf groups or should I leave them open to add additional fields)? I'd prefer to have everything well defined, but happy to leave it you guys to make a decision... 5:)
Thanks for the schema change, it looked good. I've pushed up some additional schema changes that codify the examples you provided but using sub-namespaces. Should be easy to change them if there's a good reason not to use the hierarchy, but as is hopefully it'll be straightforward to make the change in dwarf2json.
Also, a question about the macho uuid and the dwarf uuid, is there ever a time they can be different values and/or one could be not present whilst the other one is? I'm just wondering whether storing it twice is beneficial or could lead to inconsistencies (if they've never supposed to be different, but in the file, they are)?
Cool, would these be better in a namespace (so something like:
"mac": { "macho" : { "symbols": true, "uuid": "C8FBE733-0FE1-3C84-AC87-2085A51904EF", } "dwarf": { "types": true, "symbols": true, "uuid": "C8FBE733-0FE1-3C84-AC87-2085A51904EF" } }
and obviously the same for linux?
I like the suggestion of using hierarchical namespaces.
Also, are there other fields we'd want to add, or are we happy that these are the main ones we'll need (as in, can I block off additional properties to the elf/macho/dwarf groups or should I leave them open to add additional fields)? I'd prefer to have everything well defined, but happy to leave it you guys to make a decision... 5:)
I'm not sure. I'd like to discuss this schema with @npetroni, and then I'll get back to you.
Also, a question about the macho uuid and the dwarf uuid, is there ever a time they can be different values and/or one could be not present whilst the other one is? I'm just wondering whether storing it twice is beneficial or could lead to inconsistencies (if they've never supposed to be different, but in the file, they are)?
Yes, a user can select a macho file that does not match the dwarf file (they were compiled separately or from different source), in which case the UUID values would be different. In fact, capturing that in the symbols metadata would be helpful in debugging any potential issues because of the mismatch. The same idea applies to linux elf files.
The following is a modification of the original proposal. The mac
/linux
section has 2 lists: symbols
and types
. Each entry in the list has the following fields: kind
, sha256
, and name
. Below is an example for mac:
"mac": {
"symbols": [
{
"kind": "dwarf",
"name": "somefile",
"sha256": "d80566ab70265665c4144485d4d896b8405fdd0d2c9675b4be427b0e4c07086b"
},
{
"kind": "symtab",
"name": "somefile",
"sha256": "d80566ab70265665c4144485d4d896b8405fdd0d2c9675b4be427b0e4c07086b"
}
],
"types": [
{
"kind": "dwarf",
"name": "somefile",
"sha256": "d80566ab70265665c4144485d4d896b8405fdd0d2c9675b4be427b0e4c07086b"
}
]
},
Here is an example for linux:
"linux": {
"symbols": [
{
"kind": "dwarf",
"name": "module.ko",
"sha256": "299d6f6f1821c15d109fad0a651e0e2a55cb2ce70340cf3c09e82f4f757b8449",
},
{
"kind": "symtab",
"name": "module.ko",
"sha256": "299d6f6f1821c15d109fad0a651e0e2a55cb2ce70340cf3c09e82f4f757b8449"
},
{
"kind": "system-map",
"name": "System.map-4.15.0-66-generic",
"sha256": "d1001d271b33b64afbab7fcb5993a9dcf3e4c19d0bc71ca8148035a24bb27f4e"
}
],
"types": [
{
"kind": "dwarf",
"name": "module.ko",
"sha256": "299d6f6f1821c15d109fad0a651e0e2a55cb2ce70340cf3c09e82f4f757b8449",
}
]
},
This code is available in issue-11-anonymous-types
branch.
Hmmm, so I like the layout, but I'd suggest we come up with all the possible "kinds" we'll support (at the moment dwarf
, symtab
and system-map
) and then I'd probably have each item being something like:
"thing" : [
{
"kind": "dwarf",
"name": "dwarfthing",
"hash_type": "sha256",
"hash_value": "23984320985320498532049853..."
}
]
That would allow us to upgrade supported hashes over the supported versions of the schema without forcing/mandating only one. There are other ways of expressing it, I just want to think through how we'd intend to move to a newer hash as the old one becomes insecure (I assume the reason for sha256 is the worry that someone could create a malicious copy of the file that would be mistaken for it, otherwise we could just use md5 if all we're trying to do is distinguish the files absent of an attacker). Anyway, lemme know what you think, whether I'm overcomplicating it, whatever. I've mocked up the changes in the branch in vol3...
I'd suggest we come up with all the possible "kinds" we'll support (at the moment dwarf, symtab and system-map)
Yes, dwarf
, symtab
and system-map
are the known "kinds". I do not know if/when there will be additional ones.
As far as hash_type
and hash_value
organization, I can make the suggested change. I'm not sure it is necessary. I agree that we could use md5 or sha1 instead of sha256. I do not think we would need switch to a different hash for a long time (or ever), since as you've pointed out, we're trying to do is distinguish the files absent of an attacker
.
Ok, well, let's leave it for future compatibility just in case (you never know, some systems may stop supporting md5 or sha1 one day!), and then both this branch and the branch in vol3 should be in sync. What more needs to happen before we merge?
We need to make sure output of dwarf2json
is compatible with the schema changes in vol3. We can merge after that.
Fixed in #13.
Re: volatilityfoundation/volatility3#151