Closed ghost closed 1 year ago
Initially, Rust strings appear as follows: Here are some key points to note:
00
in order to fit within an 8-byte block (for amd64 architecture)Please check this sample and others again as well as refer to the linked blog post.
Many strings are stored continuously (similar to Go) and those are the strings we want to focus on as strings.exe fails.
During my examination of the binary, I came across 3 interesting types of strings:
1.
00000001400BD6E0 61 74 74 65 6D 70 74 20 74 6F 20 63 61 6C 63 75 attempt to calcu
00000001400BD6F0 6C 61 74 65 20 74 68 65 20 72 65 6D 61 69 6E 64 late the remaind
00000001400BD700 65 72 20 77 69 74 68 20 61 20 64 69 76 69 73 6F er with a diviso
00000001400BD710 72 20 6F 66 20 7A 65 72 6F 2F 72 75 73 74 63 2F r of zero/rustc/
00000001400BD720 38 34 63 38 39 38 64 36 35 61 64 66 32 66 33 39 84c898d65adf2f39
00000001400BD730 61 35 61 39 38 35 30 37 66 31 66 65 30 63 65 31 a5a98507f1fe0ce1
00000001400BD740 30 61 32 62 38 64 62 63 5C 6C 69 62 72 61 72 79 0a2b8dbc\library
00000001400BD750 5C 63 6F 72 65 5C 73 72 63 5C 73 74 72 5C 70 61 \core\src\str\pa
00000001400BD760 74 74 65 72 6E 2E 72 73 19 D7 0B 40 01 00 00 00 ttern.rs...@....
00000001400BD770 4F 00 00 00 00 00 00 00 D9 06 00 00 65 00 00 00 O...........e...
00000001400BD780 19 D7 0B 40 01 00 00 00 4F 00 00 00 00 00 00 00 ...@....O.......
In this type, there are 2 (or 1) strings. The reference to the first string is found in the .text section, while the reference to the second is immediately below it along with its string length.
2.
00000001400C1950 55 6E 63 61 74 65 67 6F 72 69 7A 65 64 4F 74 68 UncategorizedOth
00000001400C1960 65 72 4F 75 74 4F 66 4D 65 6D 6F 72 79 55 6E 65 erOutOfMemoryUne
00000001400C1970 78 70 65 63 74 65 64 45 6F 66 49 6E 74 65 72 72 xpectedEofInterr
00000001400C1980 75 70 74 65 64 41 72 67 75 6D 65 6E 74 4C 69 73 uptedArgumentLis
00000001400C1990 74 54 6F 6F 4C 6F 6E 67 49 6E 76 61 6C 69 64 46 tTooLongInvalidF
00000001400C19A0 69 6C 65 6E 61 6D 65 54 6F 6F 4D 61 6E 79 4C 69 ilenameTooManyLi
00000001400C19B0 6E 6B 73 43 72 6F 73 73 65 73 44 65 76 69 63 65 nksCrossesDevice
00000001400C19C0 73 44 65 61 64 6C 6F 63 6B 45 78 65 63 75 74 61 sDeadlockExecuta
00000001400C19D0 62 6C 65 46 69 6C 65 42 75 73 79 52 65 73 6F 75 bleFileBusyResou
00000001400C19E0 72 63 65 42 75 73 79 46 69 6C 65 54 6F 6F 4C 61 rceBusyFileTooLa
00000001400C19F0 72 67 65 46 69 6C 65 73 79 73 74 65 6D 51 75 6F rgeFilesystemQuo
00000001400C1A00 74 61 45 78 63 65 65 64 65 64 4E 6F 74 53 65 65 taExceededNotSee
00000001400C1A10 6B 61 62 6C 65 53 74 6F 72 61 67 65 46 75 6C 6C kableStorageFull
00000001400C1A20 57 72 69 74 65 5A 65 72 6F 54 69 6D 65 64 4F 75 WriteZeroTimedOu
00000001400C1A30 74 49 6E 76 61 6C 69 64 44 61 74 61 49 6E 76 61 tInvalidDataInva
00000001400C1A40 6C 69 64 49 6E 70 75 74 53 74 61 6C 65 4E 65 74 lidInputStaleNet
00000001400C1A50 77 6F 72 6B 46 69 6C 65 48 61 6E 64 6C 65 46 69 workFileHandleFi
The second type of strings doesn't have any references to this string blob.
3.
.rdata:00000001400C1088 74 68 72 65 61 64 20 27 27 20+aThreadPanicked db 'thread ',27h,27h,' panicked at ',27h,27h,', ',0
.rdata:00000001400C1088 70 61 6E 69 63 6B 65 64 20 61+ ; DATA XREF: .rdata:off_1400C10A8↓o
.rdata:00000001400C1088 74 20 27 27 2C 20 00 ; .rdata:00000001400C10B8↓o
.rdata:00000001400C10A3 00 00 00 00 00 align 8
.rdata:00000001400C10A8 88 10 0C 40 01 00 00 00 off_1400C10A8 dq offset aThreadPanicked
.rdata:00000001400C10A8 ; DATA XREF: std::panicking::default_hook::_$u7b$$u7b$closure$u7d$$u7d$::hd0fb66704d0b9f0d+50↑o
.rdata:00000001400C10A8 ; "thread '' panicked at '', "
.rdata:00000001400C10B0 08 db 8
.rdata:00000001400C10B1 00 db 0
.rdata:00000001400C10B2 00 db 0
.rdata:00000001400C10B3 00 db 0
.rdata:00000001400C10B4 00 db 0
.rdata:00000001400C10B5 00 db 0
.rdata:00000001400C10B6 00 db 0
.rdata:00000001400C10B7 00 db 0
.rdata:00000001400C10B8 90 10 0C 40 01 00 00 00 dq offset aThreadPanicked+8 ; "' panicked at '', "
.rdata:00000001400C10C0 0F db 0Fh
.rdata:00000001400C10C1 00 db 0
.rdata:00000001400C10C2 00 db 0
.rdata:00000001400C10C3 00 db 0
.rdata:00000001400C10C4 00 db 0
.rdata:00000001400C10C5 00 db 0
.rdata:00000001400C10C6 00 db 0
.rdata:00000001400C10C7 00 db 0
.rdata:00000001400C10C8 9F 10 0C 40 01 00 00 00 dq offset aThreadPanicked+17h ; "', "
.rdata:00000001400C10D0 03 db 3
.rdata:00000001400C10D1 00 db 0
.rdata:00000001400C10D2 00 db 0
.rdata:00000001400C10D3 00 db 0
.rdata:00000001400C10D4 00 db 0
.rdata:00000001400C10D5 00 db 0
.rdata:00000001400C10D6 00 db 0
.rdata:00000001400C10D7 00 db 0
In the third type of strings, there are several references just below the string.
Proposed algorithm:
Create a UTF-8 string extractor algorithm to extract all the UTF-8 strings from the binaries
this still strikes me as challenging to do correctly and quickly (especially over megabytes worth of data).
as an alternative, what if we reordered the steps to:
this way, we only have to check the data that is referenced, which seems more tractable. are there any cases that this algorithm handles less well?
@Arker123 did you look at the IDA Pro plugin released by Hex-Rays to see how they implemented Rust string detection?
Sure, I took a look at the IDA Pro plugin released by Hex-Rays. I used this link: https://hex-rays.com/blog/rust-analysis-plugin-tech-preview/ From examining the source code, it seems that they're utilizing pointers and string lengths from the .text segment to split the strings.
While this approach seems effective, associating each pointer with its corresponding length might pose challenges. I should mention that I don't have access to the IDA Pro version, so I'm unable to thoroughly test the robustness of their algorithm.
Adding some notes for completeness:-
There are two types of strings in Rust: String and &str.
A String is stored as a vector of bytes (Vec<u8>), but guaranteed to always be a valid UTF-8 sequence. String is heap allocated, growable and not null terminated.
&str is a slice (&[u8]) that always points to a valid UTF-8 sequence, and can be used to view into a String, just like &[T] is a view into Vec<T>
Handled via #836
another plugin for rust strings by @cxiao
Thanks for the tag @williballenthin! I didn't know that there was work being done in Floss on Golang and Rust binary string extraction, that's awesome. Feel free to look at / use any of the code from my plugin repo, although it is pretty Binary Ninja specific: https://github.com/cxiao/rust_string_slicer
It looks like @Arker123 already covered this, but I have a short writeup here on how the IDA Pro Rust Analysis plugin works, in case that's still useful: https://infosec.exchange/@cxiao/110637216992474832
background research:
(please update this comment directly with more references, if you'd like)