mandiant / flare-floss

FLARE Obfuscated String Solver - Automatically extract obfuscated strings from malware.
Apache License 2.0
3.2k stars 447 forks source link

extract strings from binaries compiled from Rust #791

Closed ghost closed 1 year ago

ghost commented 1 year ago

background research:

(please update this comment directly with more references, if you'd like)

Arker123 commented 1 year ago

Initially, Rust strings appear as follows: Screenshot (39) Here are some key points to note:

mr-tz commented 1 year ago

Please check this sample and others again as well as refer to the linked blog post.

Many strings are stored continuously (similar to Go) and those are the strings we want to focus on as strings.exe fails.

Arker123 commented 1 year ago

During my examination of the binary, I came across 3 interesting types of strings:

1.

00000001400BD6E0  61 74 74 65 6D 70 74 20  74 6F 20 63 61 6C 63 75  attempt to calcu
00000001400BD6F0  6C 61 74 65 20 74 68 65  20 72 65 6D 61 69 6E 64  late the remaind
00000001400BD700  65 72 20 77 69 74 68 20  61 20 64 69 76 69 73 6F  er with a diviso
00000001400BD710  72 20 6F 66 20 7A 65 72  6F 2F 72 75 73 74 63 2F  r of zero/rustc/
00000001400BD720  38 34 63 38 39 38 64 36  35 61 64 66 32 66 33 39  84c898d65adf2f39
00000001400BD730  61 35 61 39 38 35 30 37  66 31 66 65 30 63 65 31  a5a98507f1fe0ce1
00000001400BD740  30 61 32 62 38 64 62 63  5C 6C 69 62 72 61 72 79  0a2b8dbc\library
00000001400BD750  5C 63 6F 72 65 5C 73 72  63 5C 73 74 72 5C 70 61  \core\src\str\pa
00000001400BD760  74 74 65 72 6E 2E 72 73  19 D7 0B 40 01 00 00 00  ttern.rs...@....
00000001400BD770  4F 00 00 00 00 00 00 00  D9 06 00 00 65 00 00 00  O...........e...
00000001400BD780  19 D7 0B 40 01 00 00 00  4F 00 00 00 00 00 00 00  ...@....O.......

In this type, there are 2 (or 1) strings. The reference to the first string is found in the .text section, while the reference to the second is immediately below it along with its string length.

2.

00000001400C1950  55 6E 63 61 74 65 67 6F  72 69 7A 65 64 4F 74 68  UncategorizedOth
00000001400C1960  65 72 4F 75 74 4F 66 4D  65 6D 6F 72 79 55 6E 65  erOutOfMemoryUne
00000001400C1970  78 70 65 63 74 65 64 45  6F 66 49 6E 74 65 72 72  xpectedEofInterr
00000001400C1980  75 70 74 65 64 41 72 67  75 6D 65 6E 74 4C 69 73  uptedArgumentLis
00000001400C1990  74 54 6F 6F 4C 6F 6E 67  49 6E 76 61 6C 69 64 46  tTooLongInvalidF
00000001400C19A0  69 6C 65 6E 61 6D 65 54  6F 6F 4D 61 6E 79 4C 69  ilenameTooManyLi
00000001400C19B0  6E 6B 73 43 72 6F 73 73  65 73 44 65 76 69 63 65  nksCrossesDevice
00000001400C19C0  73 44 65 61 64 6C 6F 63  6B 45 78 65 63 75 74 61  sDeadlockExecuta
00000001400C19D0  62 6C 65 46 69 6C 65 42  75 73 79 52 65 73 6F 75  bleFileBusyResou
00000001400C19E0  72 63 65 42 75 73 79 46  69 6C 65 54 6F 6F 4C 61  rceBusyFileTooLa
00000001400C19F0  72 67 65 46 69 6C 65 73  79 73 74 65 6D 51 75 6F  rgeFilesystemQuo
00000001400C1A00  74 61 45 78 63 65 65 64  65 64 4E 6F 74 53 65 65  taExceededNotSee
00000001400C1A10  6B 61 62 6C 65 53 74 6F  72 61 67 65 46 75 6C 6C  kableStorageFull
00000001400C1A20  57 72 69 74 65 5A 65 72  6F 54 69 6D 65 64 4F 75  WriteZeroTimedOu
00000001400C1A30  74 49 6E 76 61 6C 69 64  44 61 74 61 49 6E 76 61  tInvalidDataInva
00000001400C1A40  6C 69 64 49 6E 70 75 74  53 74 61 6C 65 4E 65 74  lidInputStaleNet
00000001400C1A50  77 6F 72 6B 46 69 6C 65  48 61 6E 64 6C 65 46 69  workFileHandleFi

The second type of strings doesn't have any references to this string blob.

3.


.rdata:00000001400C1088 74 68 72 65 61 64 20 27 27 20+aThreadPanicked db 'thread ',27h,27h,' panicked at ',27h,27h,', ',0
.rdata:00000001400C1088 70 61 6E 69 63 6B 65 64 20 61+                                        ; DATA XREF: .rdata:off_1400C10A8↓o
.rdata:00000001400C1088 74 20 27 27 2C 20 00                                                  ; .rdata:00000001400C10B8↓o
.rdata:00000001400C10A3 00 00 00 00 00                                align 8
.rdata:00000001400C10A8 88 10 0C 40 01 00 00 00       off_1400C10A8   dq offset aThreadPanicked
.rdata:00000001400C10A8                                                                       ; DATA XREF: std::panicking::default_hook::_$u7b$$u7b$closure$u7d$$u7d$::hd0fb66704d0b9f0d+50↑o
.rdata:00000001400C10A8                                                                       ; "thread '' panicked at '', "
.rdata:00000001400C10B0 08                                            db    8
.rdata:00000001400C10B1 00                                            db    0
.rdata:00000001400C10B2 00                                            db    0
.rdata:00000001400C10B3 00                                            db    0
.rdata:00000001400C10B4 00                                            db    0
.rdata:00000001400C10B5 00                                            db    0
.rdata:00000001400C10B6 00                                            db    0
.rdata:00000001400C10B7 00                                            db    0
.rdata:00000001400C10B8 90 10 0C 40 01 00 00 00                       dq offset aThreadPanicked+8 ; "' panicked at '', "
.rdata:00000001400C10C0 0F                                            db  0Fh
.rdata:00000001400C10C1 00                                            db    0
.rdata:00000001400C10C2 00                                            db    0
.rdata:00000001400C10C3 00                                            db    0
.rdata:00000001400C10C4 00                                            db    0
.rdata:00000001400C10C5 00                                            db    0
.rdata:00000001400C10C6 00                                            db    0
.rdata:00000001400C10C7 00                                            db    0
.rdata:00000001400C10C8 9F 10 0C 40 01 00 00 00                       dq offset aThreadPanicked+17h ; "', "
.rdata:00000001400C10D0 03                                            db    3
.rdata:00000001400C10D1 00                                            db    0
.rdata:00000001400C10D2 00                                            db    0
.rdata:00000001400C10D3 00                                            db    0
.rdata:00000001400C10D4 00                                            db    0
.rdata:00000001400C10D5 00                                            db    0
.rdata:00000001400C10D6 00                                            db    0
.rdata:00000001400C10D7 00                                            db    0

In the third type of strings, there are several references just below the string.

Arker123 commented 1 year ago

Proposed algorithm:

  1. Create a UTF-8 string extractor algorithm to extract all the UTF-8 strings from the binaries, along with their starting and ending points. You can refer to this example: https://github.com/glmcdona/strings2/blob/master/strings/binary2strings.cpp
  2. Extract all the references from the entire binary that point to the .rdata section.
  3. If a reference lies between any of the UTF-8 strings extracted in the previous step, then split the string based on that reference.
williballenthin commented 1 year ago

Create a UTF-8 string extractor algorithm to extract all the UTF-8 strings from the binaries

this still strikes me as challenging to do correctly and quickly (especially over megabytes worth of data).

as an alternative, what if we reordered the steps to:

  1. Extract all the references from the entire binary that point to the .rdata section.
  2. check if each reference points to valid UTF-8 data
  3. if so, extract from the reference until the next reference, or the end of the valid UTF-8 data.

this way, we only have to check the data that is referenced, which seems more tractable. are there any cases that this algorithm handles less well?

williballenthin commented 1 year ago

@Arker123 did you look at the IDA Pro plugin released by Hex-Rays to see how they implemented Rust string detection?

Arker123 commented 1 year ago

Sure, I took a look at the IDA Pro plugin released by Hex-Rays. I used this link: https://hex-rays.com/blog/rust-analysis-plugin-tech-preview/ From examining the source code, it seems that they're utilizing pointers and string lengths from the .text segment to split the strings. image

While this approach seems effective, associating each pointer with its corresponding length might pose challenges. I should mention that I don't have access to the IDA Pro version, so I'm unable to thoroughly test the robustness of their algorithm.

Arker123 commented 1 year ago

Adding some notes for completeness:-

There are two types of strings in Rust: String and &str.

A String is stored as a vector of bytes (Vec<u8>), but guaranteed to always be a valid UTF-8 sequence. String is heap allocated, growable and not null terminated.

&str is a slice (&[u8]) that always points to a valid UTF-8 sequence, and can be used to view into a String, just like &[T] is a view into Vec<T>
mr-tz commented 1 year ago

Handled via #836

williballenthin commented 11 months ago

another plugin for rust strings by @cxiao

https://infosec.exchange/@cxiao/111215357786731310

cxiao commented 11 months ago

Thanks for the tag @williballenthin! I didn't know that there was work being done in Floss on Golang and Rust binary string extraction, that's awesome. Feel free to look at / use any of the code from my plugin repo, although it is pretty Binary Ninja specific: https://github.com/cxiao/rust_string_slicer

It looks like @Arker123 already covered this, but I have a short writeup here on how the IDA Pro Rust Analysis plugin works, in case that's still useful: https://infosec.exchange/@cxiao/110637216992474832