mandiant / flare-floss

FLARE Obfuscated String Solver - Automatically extract obfuscated strings from malware.
Apache License 2.0
3.12k stars 448 forks source link

Unable to find stack strings in large functions alloca_probe / XMM registers #942

Open llebout opened 5 months ago

llebout commented 5 months ago

Hello!

I have this sample referenced in this other issue: https://github.com/mandiant/flare-ida/issues/127

I also ran it through your tool and very few strings were found, ironstrings found way more. I don't know exactly why but this program has lots of stack strings constructed with XMM registers and that may be the reason but it also was able to find some strings that used XMM registers.

by default flare-floss will be very slow on this sample, to speed up we can blacklist some functions from analysis but those functions may also contain useful stack strings to extract, it would be nice to have flare-floss be faster on those. I already ran flare-floss fully without blacklisting any function, it takes something like 3 hours but then it doesnt find any more stack strings.

Add

if fva == 0x1802adde0 or fva == 0x1802a74a0 or fva == 0x18029ea70:
                continue

After:

https://github.com/mandiant/flare-floss/blob/b2ca8adfc5edf278861dd6bff67d73da39683b46/floss/string_decoder.py#L151

This will speed up the execution by a lot.

If you have any clues why this does not work I can help fixing but I did not find any solution for now. For example the function at 0x1802b4d40 has a lot of stack strings but none are found there, in this function ironstrings didnt find any strings either because it errors trying to do so.

Thanks a lot!

mr-tz commented 5 months ago

I suspect XMM / SSE instructions to be the issue and will take a look in the next couple of days. Thanks for reporting this here as well!

llebout commented 5 months ago

On further investigation, I can pinpoint some functions with many stack strings where no strings was found by either flare-floss or ironstrings:

0x1802b4d40 0x180138aa0 (very big function but appears to contain mainly stack strings of approx 10 characters) 0x1802ae880

mr-tz commented 5 months ago

The disassembly and emulation tool (vivisect) has multiple issues with this binary. Most notably, errors during disassembly and unsupported instructions (as suspected above, e.g. unpcklpd @ 0x1802b7de8.

So, I'm afraid we have to call this out of reach for now.

FYI, to analyze specific functions only, see the -H help output:

--functions FUNCTIONS [FUNCTIONS ...]
                        only analyze the specified functions, hex-encoded like 0x401000, space-separate multiple
                        functions

So, here e.g. floss.exe sample.txt --function 0x1802b4d40

llebout commented 5 months ago

@mr-tz Thanks for looking, about the unsupported instructions I think those come from the statically linked openssl in this binary but all the functions with stack strings only use XMM registers with mov.

About the --functions flag, I was unsure if that meant to use the function as a decoding function or to only process that function. Since here, no strings are actually encoded, they are just layed out in memory abit different initially then assembled in order with XMM mov's.

The main part that is being really slow is the analyze program phase which seems to scan with FLIRT sigs. Is there any way to save the scanning results and reuse it for faster debugging of this issue?