mandiant / flare-ida

IDA Pro utilities from FLARE team
Apache License 2.0
2.21k stars 466 forks source link

0 functions applied in IDA from .sig file #107

Open KulaGGin opened 3 years ago

KulaGGin commented 3 years ago

Pretty sure it's not me doing something wrong(and creating all those issues I mean). Works on a simple VC++ Hello World project as expected and explained in the articles(One, Two):

On the other hand, in a big UE4 project 0 functions get applied in IDA from .sig file after generating the .sig file with sigmake from idb2pat:

I generate the .sig file using command sigmake -lrsub_ "S05_TestingGrounds-Win64-Shipping - No Xdigit errors(deleted lines with errors).pat" "S05_TestingGrounds-Win64-Shipping - No Xdigit errors(deleted lines with errors).sig". -lrsub_ parameter is to exclude functions that have sub_ in them.

After generating the .sig file and then trying to apply the .sig file, that's what I get: 0 functions applied.

Here's the link with the project, pat and sig file, so you can try to generate and apply this sig file onto executable yourself: https://www.dropbox.com/h?preview=TestingGrounds_DebugSymbols.zip

This is a Unreal Engine 4.26 C++ project created from FPS template which I packaged in UE4 with debug symbols.

The executable, pat and .sig files are in \WindowsNoEditor\S05_TestingGrounds\Binaries\Win64\ folder. The original pat file with xdigit problem is called S05_TestingGrounds-Win64-Shipping - Original.pat. The pat file with problematic lines deleted which cause xdigit problem is called S05_TestingGrounds-Win64-Shipping - No Xdigit errors(deleted lines with errors).pat.

Problem for 0 functions applied is somewhere between lines 30000 and 35000 in the .pat file because if I delete all lines after line 30000, it generates valid .sig file and then applies this sig file appropriately on the executable:

After I delete lines 30000 - 35000 in the pat file and then delete all the lines after 50000, it generates valid .sig files and applies the sig file appropriately on the executable:

As you can see on the screenshots, no meaningful logs is generated after applying new FLIRT signature in IDA, just Plan FLIRT signature: Unnamed sample library in the case of success and failure to apply any function signatures.

KulaGGin commented 3 years ago

After digging into this more, I found out that the first line that causes this problem with 0 functions applied is the line #31605(in the S05_TestingGrounds-Win64-Shipping - No Xdigit errors.pat file), this line:

4053578D42FE8BDA994C8BC92BC2D1F88BF80F88B400000048897424308D3445 FF BB85 01C0 :00000000 ??$HeapSortInternal@T?$TSparseArrayElementOrFreeListLink@U?$TAlignedBytes@$0BI@$07@@@@UFIdentityFunctor@@U?$TDereferenceWrapper@T?$TSparseArrayElementOrFreeListLink@U?$TAlignedBytes@$0BI@$07@@@@V?$FElementCompareClass@V?$FElementCompareClass@V?$FValueComparisonClass@U?$TGreater@H@@@?$TSortableMapBase@_KHVFDefaultSetAllocator@@U?$TDefaultMapHashableKeyFuncs@_KH$0A@@@@@@?$TSet@U?$TTuple@_KH@@U?$TDefaultMapHashableKeyFuncs@_KH$0A@@@VFDefaultSetAllocator@@@@@?$TSparseArray@V?$TSetElement@U?$TTuple@_KH@@@@V?$TSparseArrayAllocator@V?$TSizedDefaultAllocator@$0CA@@@VFDefaultBitArrayAllocator@@@@@@@@@AlgoImpl@@YAXPEAT?$TSparseArrayElementOrFreeListLink@U?$TAlignedBytes@$0BI@$07@@@@HUFIdentityFunctor@@U?$TDereferenceWrapper@T?$TSparseArrayElementOrFreeListLink@U?$TAlignedBytes@$0BI@$07@@@@V?$FElementCompareClass@V?$FElementCompareClass@V?$FValueComparisonClass@U?$TGreater@H@@@?$TSortableMapBase@_KHVFDefaultSetAllocator@@U?$TDefaultMapHashableKeyFuncs@_KH$0A@@@@@@?$TSet@U?$TTuple@_KH@@U?$TDefaultMapHashableKeyFuncs@_KH$0A@@@VFDefaultSetAllocator@@@@@?$TSparseArray@V?$TSetElement@U?$TTuple@_KH@@@@V?$TSparseArrayAllocator@V?$TSizedDefaultAllocator@$0CA@@@VFDefaultBitArrayAllocator@@@@@@@@@Z :00000018 loc_1409F6E08 :00000024 loc_1409F6E14 :00000030 loc_1409F6E20 :0000005F loc_1409F6E4F :000000A8 loc_1409F6E98 :000000BB loc_1409F6EAB :000000CC loc_1409F6EBC :000000E4 loc_1409F6ED4 :0000010D loc_1409F6EFD :00000120 loc_1409F6F10 :0000014C loc_1409F6F3C :0000019A loc_1409F6F8A :000001AD loc_1409F6F9D :000001BD loc_1409F6FAD 00438D1C128D43018D4801413BCB7D1D4863CB4C8D04494863C8488D1449438B4CC13841394CD1080F4CC3FFC04863C8488D14494963CA4D8D04D1488D1449418B4CD1084D8D14D1413948087D404D3BD07428410F1000410F101AF2410F105210410F1102F2410F104810F2410F114A10410F1118F2410F115010448BD08D044501000000413BC30F8C73FFFFFF41FFCB4883EF184585DB0F8F27FFFFFF5F5BC3

If I make a sig with first 31605 lines, it applies 0 functions. If I make a sig with 31604 lines, it applies a few tenths of thousands functions. I disabled word wrap and compared lines 31604 and 31605 visually in Notepad++: Line 31604:

488974241855574156488D6C24B94881ECB0000000488B4210488BF233FF440F FF 11D2 0257 :00000000 ??$GetVertexPosition@UFNDITransformHandlerNoop@@@UNiagaraDataInterfaceStaticMesh@@QEAAXAEAUFVectorVMContext@@@Z :00000050 loc_1409F6BE0 :00000062 loc_1409F6BF2 :00000100 loc_1409F6C90 :00000148 loc_1409F6CD8 :0000014C loc_1409F6CDC :0000018C loc_1409F6D1C :00000190 loc_1409F6D20 :000001B0 loc_1409F6D40 :0000022A loc_1409F6DBA :00000243 loc

I noticed that the name of 31604 line ends much sooner than the name on the line 31605: notepad++_sLQfQoURLN

After I shortened line 31605 to this:

4053578D42FE8BDA994C8BC92BC2D1F88BF80F88B400000048897424308D3445 FF BB85 01C0 :00000000 ??$HeapSortInternal@T?$TSparseArrayElementOrFreeListLink@U?$TAlignedBytes@$0BI@$07@@@@UFIdentityFunctor@@@Z :00000018 loc_1409F6E08 :00000024 loc_1409F6E14 :00000030 loc_1409F6E20 :0000005F loc_1409F6E4F :000000A8 loc_1409F6E98 :000000BB loc_1409F6EAB :000000CC loc_1409F6EBC :000000E4 loc_1409F6ED4 :0000010D loc_1409F6EFD :00000120 loc_1409F6F10 :0000014C loc_1409F6F3C :0000019A loc_1409F6F8A :000001AD loc_1409F6F9D :000001BD loc_1409F6FAD 00438D1C128D43018D4801413BCB7D1D4863CB4C8D04494863C8488D1449438B4CC13841394CD1080F4CC3FFC04863C8488D14494963CA4D8D04D1488D1449418B4CD1084D8D14D1413948087D404D3BD07428410F1000410F101AF2410F105210410F1102F2410F104810F2410F114A10410F1118F2410F115010448BD08D044501000000413BC30F8C73FFFFFF41FFCB4883EF184585DB0F8F27FFFFFF5F5BC3

notepad++_y103S2C9qw

It applies the signatures as expected! Yay!

Now I guess it's either the name length or some combinations of characters in the name causing sigmake to generate invalid .sig file. Or long/invalid name causing IDA to not to apply this .sig file onto executable for some reason.

For right now I think I'll be able to search for long 6th members in the strings with regular expressions in Notepad++ and shorten their names when they're too long.

I guess I could try to find where idb2pat generates these names and just trim them up to like 125 characters or something and hope it's name length and not combination of characters.

I hope this will help you to fix this bug, wherever it is(in idb2pat, sigmake or IDA itself).

williballenthin commented 3 years ago

I guess I could try to find where idb2pat generates these names and just trim them up to like 125 characters or something and hope it's name length and not combination of characters.

I hope this will help you to fix this bug, wherever it is(in idb2pat, sigmake or IDA itself).

thank you!

yes, in the short term, i'll update idb2pat to ensure the symbol names are not too long. i'll also try to reproduce the sigmake issue and report a bug to hex-rays if appropriate. i'll update here with what i hear.

thanks for taking the time to dig into the bug and suggest a fix, it really helps.

KulaGGin commented 3 years ago

Looks like somewhere there's a cap to 1000 characters for the function name, or close to that. After I do search and replace on the file using this \r\n.{64}\x20.{2}\x20.{4}\x20.{4}\x20:.{8}\x20[^ \r\n]{1010,}.*(?=\r\n) regular expression, like this: notepad++_W9trBFTppK

It removes whole lines where 6th member is longer than 1010 characters. It does 41 replaces.

It applies all the functions that are left.

But if I do replace with \r\n.{64}\x20.{2}\x20.{4}\x20.{4}\x20:.{8}\x20[^ \r\n]{1025,}.*(?=\r\n)(1025 characters instead of 1010), it does 38 replaces. Means there are 3 functions with names between 1010 and 1025 characters long. It doesn't apply functions(applies 0 functions) in this case.

It does look like a name length problem. From 170k lines there would be shorter lines like 800 characters long with problematic character sequences.

williballenthin commented 3 years ago

i bet there's a hardcoded limitation in sigmake of the symbol being 0x200 (1024) characters long, or less.

i ran into a similar issue that sigmake would not process more than 0x2000 leaves, but this could be bypassed by patching sigmake ;-) image

in this case, i think we should restrict the length of symbols generated by idb2pat.

An0nyMooUS commented 3 years ago

Yea! I'm having the same problem here, it works in home tests (like hello-world), but in large projects the same problem occurs. 0 applied functions.

HongThatCong commented 3 years ago

Hi @williballenthin I am creating CryptoPP v8.5 signature for IDA. I have a strange error of sigmake. Sigmake create sig file OK, but dumpsig can not load or dump the result sig file. Can you give me your sigmake_patched.exe. Best regards, HTC (TQN)

mr-tz commented 3 years ago

Unfortunately, we cannot share the patched sigmake. To make the changes yourself you can patch the cmp just before the branch to the too many leaves (%d) error output. In the version I have that's around VA 0x14000A05C.

jfmherokiller commented 3 years ago

in 7.6 the too many leaves cmp is at 0x140009ECC

jfmherokiller commented 3 years ago

also one of the things which can cause bad xdigit is lines like

488954241048894C2408B848430000E8BCB3C403482BE0488B051A5554054833 FF E650 1A798 :00000000 ?RunTest@FMetadataTest@@MEAA_NAEBVFString@@@Z :00000075 loc_140C35CF5 8424B821

you can see the issue by comparing it to this line

B80B000000C3.................................................... 00 0000 0006 :00000000 ?GetTestSourceFileLine@FMetadataTest@@UEBAHXZ

basicly 1A798 is 5 digits instead of 4 digits

jfmherokiller commented 3 years ago

A way to find such lines is using the regex of

[0-9A-F][0-9A-F] [0-9A-F][0-9A-F][0-9A-F][0-9A-F] [0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]

it checks for sequences like FF E650 1A798