HiveFileAddressSpace clean-up

GoogleCodeExporter commented 9 years ago

So the HiveFileAddressSpace is not an actual address space (it's not even a 
new-style object).  Ideally this should be cleaned up, although it's low 
priority.

We've got two options:

1) Fix it up to be an actual AddressSpace.  This requires a config object which 
could be a dummy (but would then pull in volatility.conf) or it could be None, 
but then any attempts to write the AS would fail.

2) Get rid of it, because it's only called by dump_file_hashes, and those 
aren't called anywhere.  It makes it difficult to use the code to read raw Hive 
files, but is volatility likely to be used on raw files rather than on memory 
dumps?  If it is, then we can probably re-add the code then, and engineer it 
properly.

I'm leaning most towards option 2, but I'm up for opinions.  I've attached a 
patch to remove the HiveFileAddressSpace for what it's worth...

Original issue reported on code.google.com by mike.auty@gmail.com on 24 Feb 2012 at 11:06

Merged into: #168

Attachments:

volatility-remove-hivefileaddressspace.patch

GoogleCodeExporter commented 9 years ago

Hey Mike: 

> is volatility likely to be used on raw files rather than on memory dumps?  

Yes, for sure. That's one of the big things I think attc and AW are looking 
forward to (for example you can use vol's object model to parse a pcap file, a 
PE file, a disk even).

Original comment by michael.hale@gmail.com on 28 Feb 2012 at 1:09

GoogleCodeExporter commented 9 years ago

Definitely is for me...

Also, I don't know if this is the best place to document it, but it would be 
great if parsers could be written somewhere outside a plugin/address space/etc 
and then usable by them. For example, an ELF parser could be written and then 
used for:

1) the virtual box address space
2) a 'readelf' plugin
3) plugins to parse the executable and shared libraries of a userland process

Original comment by atc...@gmail.com on 29 Feb 2012 at 12:12

GoogleCodeExporter commented 9 years ago

I'm CCing in moyix since these are his babies (and I haven't heard from him in 
a while and want to say hi).  Hi Moyix!  5:)

Hmmmm, so I'd be very worried about function creep.  Volatility and the vtype 
language are excellent for structures that fit a very fixed format (unions are 
about as complex as they get), but it's not very fast, and not very well suited 
for generic file-format parsing.  The vtype language itself has undergone many 
changes, but we're still maintaining backwards compatibility (which makes 
handling things like Pointers and Arrays special cases, instead of just normal 
object_classes), so the syntax isn't concise or clearly defined.

There's many other python-based parsers out there that have very different 
qualities (for example construct, or hachoir) and can be better adapted to 
particular tasks, and there's library designed specifically for certain 
structures (such as pefile or scapy) that would do a far better job that 
volatility could.

Having worked on two file-format parsers in volatility (contrib/verinfo.py and 
a eventually-to-be-released pdbparser), it's really quite a hack to get it do 
even simple things like conditional fields, and once it's done it's very slow 
at parsing that data.  I would far rather people use libraries designed for the 
job than try to reinvent the wheel in volatility and end up with something that 
won't roll.

I'm not completely against parsing arbitrary data files using volatility, but I 
strongly believe it would need deciding on a case-by-case basis.

In this particular instance, the functions that make use of the 
HiveFileAddressSpace require the input of two distinct hive files, and at the 
moment volatility only really works on a single data file at a time (because of 
a limitation in the way address spaces take their location from the 
configuration object).  Since they weren't used, I didn't think they'd be 
missed, but if people want them to stay then I'd recommend attempting to 
implement a plugin that uses them and see how easy/difficult it is to do, and 
whether it's worthwhile...

Original comment by mike.auty@gmail.com on 29 Feb 2012 at 9:28

GoogleCodeExporter commented 9 years ago

The volatility object parsing library is a superior parser because it has an 
autogenerated vtype system. This means you can write the functionality of each 
object separate from its memory layout - you can also import layout from pdb 
files etc. All the parsers out there require manually writing some kind of 
template for struct layout.  The volatility object parsing method is supposed 
to be very fast, mainly due to the fact that struct members are decoded on 
demand (rather than decoding the entire struct at once).

The intention I had in writing the object parser was to make it a generic 
parsing library which should equally work on memory or files. Somehow along the 
way this was confounded with the address space model which was a mistake - and 
it has been rectified in the latest code rewrite (in the scudette branch).

I think parsing registry files is a very good goal. If the object parsing 
framework is not fast enough, we can improve this. Either way this is needed to 
further improve the parsing framework.

Original comment by scude...@gmail.com on 5 Mar 2012 at 10:38

GoogleCodeExporter commented 9 years ago

This sounds like something scudette has either already done or that we should 
do for 3.0, so marking it as such.

Original comment by michael.hale@gmail.com on 1 Feb 2013 at 4:46

Added labels: Milestone-3.0.x

GoogleCodeExporter commented 9 years ago

Merging this with issue 168 - "registry code needs converting to the new object 
model"

Original comment by michael.hale@gmail.com on 9 Apr 2013 at 7:36

Changed state: Duplicate

ksanchezcld / volatility

HiveFileAddressSpace clean-up #220