Closed amohar closed 3 years ago
If I understand your question correctly, you're interested in the original source code line number, rather than the line in the Jimple IR. The latter would be trivial to obtain, since you have the Unit
/ Stmt
object and can simple iterate over all units in the respective method, and count. Once a unit matches (Java ==
identity comparison)= the object you have as part of the InfoflowResults
, that's your line and the counter value is your line number.
If you're really looking for the Java source number, that's tricky. Let us consider the following scenarios:
1) Your analysis is part of a development lifecycle, i.e., you have the original code, it gets compiled, and then analyzed. In this case, I'd make sure that the debugging information is there (in the class files / APK) and doesn't get stripped by the compiler along the way. If it's in the binary, Soot puts a SourceLineTag
or a LineNumberTag
on each unit, which should be fairly precise. I say fairly, because Soot needs to split up complex Java instructions into Jimple's three-address-code, and we're doing our best to keep line numbers as sensible as possible.
2) You don't have the source code, only the binary. In this case, you rely on decompilation, and the structure of the Java code greatly depends on the formatting style of the decompiler. If the decompiler is based on Soot, you might have a chance to retain a mapping, otherwise you'd need to deep-dive into the decompiler and see if there is anything that use can use for building a link. Even if you have a the original line numbers as part of the debug info, there is no guarantee that these line numbers match the decompiled code. However, if the decompiler retains these attributes, you might be able to link Jimple instructions with the same source line number and decompiled Java with the same source line number.
Dava isn't the most widely-used part of Soot and thus hasn't received substantial care over the last years. You're pretty much going for an adventure there, at least from my last experience with Dava...
Hello Steven, thanks for your prompt answer, and thanks for the steps on how to obtain the jimple line number. You're correct, my ultimate goal is to obtain the line number in Java code, if possible. I understand it is going to be quite an endeavor and that the chances are slim it will be good enough. My scenario is the second one, I'm working with binary code, hence I can't influence the line numbers annotations. Thanks for your insights into Dava, I'll give it a shot and if it fails, I'll fall back to just using jimple.
I'm back with another probably very noobish question. I would like to know if I can easily link the found sources and sinks objects to the line numbers in the jimple files. I'm aware that there is a command-line switch that allows me to print the java code line, but that only happens if the smali has the proper annotations. If the annotations are ripped out, this doesn't work (please correct me if I'm wrong with this, but my tests showed this is true). I have a working PoC java code that uses FlowDroid as a library and gives me the InfoflowResults object, so I can add whatever code is necessary. Also, I know I get the jimple line source code, but this is not unique enough to get the jimple line number (a function could have a few lines with exactly the same source). This could be a fallback solution but is very ugly.
The idea here is to link the finds with jimple lines, then use soot to decompile the jimple using dava (hopefully that's possible) to java files and during that process link which jimple lines end up in the produced java lines, thus also changing the jimple line to java line (hopefully that's possible, too). But this is probably a question better suited for Soot project. The ending result would be knowing exactly the taint path throughout a java decompiled file. Thanks for any input.