meta_pe: fix rich header length check for hash calculation

lmco / laikaboss

Laika BOSS: Object Scanning System

Apache License 2.0

739 stars 156 forks source link

meta_pe: fix rich header length check for hash calculation #50

Open knowmalware opened 8 years ago

knowmalware commented 8 years ago

The original Rich Signature write-up: http://www.ntcore.com/files/richsign.htm searches 400 bytes for the "Rich" string. The pefile module searches 128 bytes for the string. I have found that 128 is sometimes not enough, and 400 feels rather large, so I have choosen a round (hex) value in between.

I also take a lesson from the original write-up and search for NULL values, but added a search for the PE header as well.

Note that because we rely on pefile module for the Rich Header Values, that array will be incomplete, and thus differ from that used for hash calculation, until pefile itself is fixed.

marnao commented 7 years ago

@knowmalware thanks for the pull request.. we definitely need to increase the search area for the rich header ending, although I'm not sure what the optimal value is. Your guess is as good as any.

I'm not sure I follow the other part of your modification, specifically around looking for the null values and PE header. When would this be useful? Do you have any samples you could point to?

knowmalware commented 7 years ago

The other part of my modification only matters when the Rich header has been tampered with or replaced. The search for the PE header is the first attempt to find the end, as the Rich header should be right before the PE header. The search for NULL values is the fall-back, as the Rich header should not contain any NULL dwords. In practice, I usually see a set of NULL bytes before the PE header, so it made sense to me.

If you're uncomfortable with this, I can change the PR to just not produce a hash if the Rich string doesn't exist, so that the code doesn't cause an exception when analyzing a PE file produced by a non-Microsoft compiler. But I'd prefer to leave it as-is for malware analysis purposes, as any changes to the Rich header could still be interesting from a similarity perspective.

marnao commented 7 years ago

@wxsBSD would you mind reviewing this change? You're probably more familiar with this stuff than I am given your work on yara.

wxsBSD commented 7 years ago

Shouldn't it be possible to go from the Rich header start to (at most) the NT header start? IE: It should be from 0x80 to uint32(0x3c). You can also be extra careful and ensure it ends with DanS.

wxsBSD commented 7 years ago

Also, starting at 0x80 works because nobody ever changes the size of the DOS stub. The right thing to do is calculate the starting offset and ensure it is Rich.

wxsBSD commented 7 years ago

I haven't looked at Frank's code, but does https://github.com/erocarrera/pefile/commit/a3e5d096b6658aaa0267cd9b030c85aa4aa263d6 not look like it makes rich header parsing more robust, in the manner I'm describing? It seems to search for the ending up to the NT header.

knowmalware commented 7 years ago

I half agree with Wes. I'll change my PR to search from 0x80 to pe.NT_HEADERS.get_file_offset(), which is what pefile should be doing instead of searching through to the OPTIONAL_HEADER.

If there's non-null bytes between 0x80 and the start of the PE header, I'd still be interested in that result from a malware analysis perspective, but perhaps it should be called something other than the rich sig in that case. I'll updated this PR accordingly.

knowmalware commented 7 years ago

Latest release of pefile makes this much easier. Updated the code to use the clear data exposed by pefile.

knowmalware commented 7 years ago

should also fix #55