making hexdump accessible / combining them into one

GoogleCodeExporter commented 9 years ago

Hey guys, 

We have a few versions of the hexdump formula in different places.

* one is in the volshell.db function. it uses print instead of sys.write 
* one is in printkey.py and also lsadump.py. they return a string 
* one is in my malware.py. it returns a list of lines, that you can join 
yourself with '\n'.join() or '<br>'.join() if doing HTML output. 

So two questions:
  - Should we try and reduce the hexdump formulas down to one?
  - If so, which one of the various hexdump formulas do you prefer?

Original issue reported on code.google.com by michael.hale@gmail.com on 15 Aug 2011 at 9:27

GoogleCodeExporter commented 9 years ago

For what it's worth: I've only used the hexdump function in the printkey plugin 
so I don't know which hexdump is better, but I think that it would be better to 
have one hexdump function as a "utility" function that other plugins could use 
more easily (without having to hunt for the function in another plugin).  

If there is something that is needed from each of the functions, we can add 
them as options.

Original comment by jamie.l...@gmail.com on 18 Aug 2011 at 4:44

GoogleCodeExporter commented 9 years ago

Hmmmm,

I'd probably have a function that outputs a full "\n" separated string.  If 
it's going into html, it should be wrapped in pre (since it should be 
monospace), or if the plugin author really wants to jigger it then they can 
split/reconstruct it as they want.  The plugin can then also decide whether to 
write to stdout, or print as it likes (I don't know why volshell prints).

It should probably live in volatility.utils, if we decide it's a long enough 
bit of code to bother avoid duplication by offering a framework copy.

I'd also recommend using the volshell version only because it does 8 character 
offsets, rather than 4 (we may need to consider 16 when 64-bit support hits)...

Original comment by mike.auty@gmail.com on 18 Aug 2011 at 6:20

GoogleCodeExporter commented 9 years ago

volatility.utils sounds like a good place. It is also something that linux 
plugins may want to import, so giving it a neutral home would be cool. 

As for which version of the function to use, we can use these factors:

1) it should output a full "\n" separated string (and not print anything)
2) it should use at least 8 character offsets, optionally 16 to support 64-bit
3) it should allow a start address (per Gleeda and myself) 

Anything else?

Original comment by michael.hale@gmail.com on 18 Aug 2011 at 7:11

GoogleCodeExporter commented 9 years ago

Actually that code made me raise an eye brow. The usual way for translating is 
something like:

s = [x if x < 100 and x > 32 else "." for x in s]

Otherwise the utility function should probably return a line at the time as a 
generator. Maybe something like this:

def Hexdump(data, width=16):
  for offset in xrange(0, len(data), width):
      row_data = data[offset:offset+width]
      translated_data = [x if x < 100 and x > 32 else "." for x in row_data]
      hexdata = " ".join(["{0:02x}".format(ord(x)) for x in row_data])

      yield offset , hexdata, translated_data

Then the callers can just format the results as they see fit. If you are going 
to write HTML you probably also want to place the translated and hexed data in 
different divs or td elements.

Original comment by scude...@gmail.com on 18 Aug 2011 at 7:11

GoogleCodeExporter commented 9 years ago

Cool, thanks Scudette! That seems to satisfy the criteria. Anyone opposed to 
placing that in volatility.utils and referencing it from printkey, lsadump, 
volshell, malware plugins?

Original comment by michael.hale@gmail.com on 18 Aug 2011 at 7:41

GoogleCodeExporter commented 9 years ago

No objections from me.

Any idea who wrote the original FILTER code?  Might be worth finding out what 
they were thinking at the time?

Original comment by mike.auty@gmail.com on 18 Aug 2011 at 7:44

GoogleCodeExporter commented 9 years ago

no opposition here.  volatility.utils sounds like a good place for it ;-)

Original comment by jamie.l...@gmail.com on 18 Aug 2011 at 7:45

GoogleCodeExporter commented 9 years ago

The original FILTER code I think came from here:

http://code.activestate.com/recipes/142812/

Original comment by michael.hale@gmail.com on 18 Aug 2011 at 7:47

GoogleCodeExporter commented 9 years ago

@mike.auty : wasn't that moyix ?  I see it in the original printkey plugin...

Original comment by jamie.l...@gmail.com on 18 Aug 2011 at 7:48

GoogleCodeExporter commented 9 years ago

ahh nevermind.. MHL found it

Original comment by jamie.l...@gmail.com on 18 Aug 2011 at 7:49

GoogleCodeExporter commented 9 years ago

Hmmm, weird.  Just can't figure out what that (len(repr(chr(x)))==3) is for?  
Anyway, very happy with scudette's version, so lets go with that...  5:)

Original comment by mike.auty@gmail.com on 18 Aug 2011 at 7:55

GoogleCodeExporter commented 9 years ago

Heh, yeah was scratching my head too. Its basically the same as x > 99 and x < 
1000.

Original comment by scude...@gmail.com on 18 Aug 2011 at 8:10

GoogleCodeExporter commented 9 years ago

I made these changes in r1059. Only difference in the Hexdump function was this:

- translated_data = [x if x < 100 and x > 32 else "." for x in row_data]
+ translated_data = [x if ord(x) < 100 and ord(x) > 32 else "." for x in 
row_data]

The original version printed "." for all characters, so ord() needed to be 
added. 

If at least one person can verify the changes and that we all agree on the new 
format, we can close this issue shortly thereafter.

Original comment by michael.hale@gmail.com on 20 Aug 2011 at 4:07

GoogleCodeExporter commented 9 years ago

Doh about the ord(x) ... thanks for fixing.

Original comment by scude...@gmail.com on 20 Aug 2011 at 6:01

Changed state: Verified

ksanchezcld / volatility

making hexdump accessible / combining them into one #134