skylot / jadx

Dex to Java decompiler
Apache License 2.0
40.13k stars 4.76k forks source link

[feature] Better deobfuscation of APK file #965

Open ghost opened 3 years ago

ghost commented 3 years ago

Describe your idea: I try to analyze the malcious APK which had superuser permission, but it's heavily obfuscated. Deobfuscating it seems to make it quite worse. So my idea is to improve deobfuscation of APK file, better renaming and code cleanup

APK file: app.zip

alissonlauffer commented 3 years ago

What are your ideas for "better renaming and code cleanup"?

febianf commented 3 years ago

I think you could add an option for preserve class names and function names... For example I am decoding an apk service app which is called externally by other apk. And changing the names break the source. But if i don't deobfuscate it can't compile because there are vars and funcs with same name

Other thing that the ofuscated code seems to do is, add useless classes just to randomize more the source and create more files and calls. You could add a function to check how many times a class is referenced in the rest of the code, and if it's only called 1 time just remove the class and file. And put that class code into where it was called.

Thanks guys

febianf commented 3 years ago

I also think that ofuscated code also generate many class files from one single file. It would be good to create an option to merge again that file splitted in different classes. Because there are many classes with really short code, I think it's splitted to make it even more difficult

jpstotz commented 3 years ago

@febianf

I think you could add an option for preserve class names and function names... For example I am decoding an apk service app which is called externally by other apk. And changing the names break the source. But if i don't deobfuscate it can't compile because there are vars and funcs with same name

I assume you are talking about classes that are referenced by the AndroidManifest.xml? If I remember correctly enabling the deobfuscation has no effect on the generated AndroidManifest.xml therefore with enabled deobfuscation the class references from this file are invalid. APKs that are obfuscated this way are very very rare. Usually classes used in AndroidManifest are preserved and not obfuscated.

Other thing that the obfuscated code seems to do is, add useless classes just to randomize more the source and create more files and calls. You could add a function to check how many times a class is referenced in the rest of the code, and if it's only called 1 time just remove the class and file. And put that class code into where it was called.

Could you please check if you have enabled the option "Inline anonymous classes" in preferences?

@alissonlauffer

What are your ideas for "better renaming and code cleanup"?

Thinking about your question I found one realistic improvement: At the moment all deobfuscated class names are generated by a static pattern. May be we should extend the deobfuscation to include mor details on the class into it's name.

One common example are classes that extend android.app.Activity. Deobfuscating such classes also ends up in a generic class generated class name like C0000a. As enhancement we could prepend "known details" on the class so that the deobfuscated class name is Activity_C0000a.

We could identify certain classes or interfaces we check the class hierarchy and if present the class/interface name is prepended to the generated deobfuscated class name. The question is now if this can be done automatically or if it is better to have a manually maintained list of classes/interfaces we include into the deobfuscated class name.

A second source for improving deobfuscated class names are the class properties like abstract or if it is an Interfaceor not. Such properties should be included into the deobfuscated class name.

ghost commented 3 years ago

I mean rename from obfuscated names to Class0, Class1... Method0, Method1 and so on. Code cleaup like removing unused codes and merging splitted methods into one as possible to more human readable. jpstotz is a good suggestion

de4dot which deobfuscate C# application does way better and it's a lot more readable. Maybe any of you can try to replicate deobfuscation like de4dot as an example? https://github.com/0xd4d/de4dot

Sorry if I don't explain right.

LGLTeam commented 3 years ago

I have discovered an APK that has added extra small dex which is obfuscated, and has a very large codes in onCreate method

It would be great if jadx has an ability to emulate and simplify those codes as possible it can

classes2.zip

skylot commented 3 years ago

@LGLTeam to deobfuscate code using symbolic execution you can try simplify project.

LGLTeam commented 3 years ago

Oh didn't know that. I will take a look