pascal-lab / Tai-e

An easy-to-learn/use static analysis framework for Java
https://tai-e.pascal-lab.net/docs/index.html
GNU Lesser General Public License v3.0
1.4k stars 172 forks source link

Replacing / manipulating IR of methods in "world" #67

Closed staslath closed 11 months ago

staslath commented 11 months ago

Does tai-e support manipulating/replacing the IR of a method that has already been included in the World?

Motivating Use Case

Consider the Service Loader java API, suppose we have a service called SearchEngine, with 2 implementations:

Client code could say:

var results =
  ServiceLoader.load(SearchEngine.class).stream()
    .map(ServiceLoader.Provider::get)
    .map(searchEngine -> searchEngine.search("cat"))
    .toList();

and Java's ServiceLoader infrastructure would then go and search the META-INF/services/ directories in the classpath jars to find implementors.

For example, one jar could have a file META-INF/services/mypkg.SearchEngine with the following text line:

mypkg.YahooSearchEngine

and another:

mypkg.BingSearchEngine

Since implementing classes are read from files, tai-e won't know about them, which compromises call graph construction for example.

I'd love to have your perspective on what would be the tai-e idiomatic way to address this issue.

A naive approach would seek to "rewrite":

ServiceLoader.load(SearchEngine.class).stream()

to something like the following (since we know mypkg.YahooSearchEngine/BingSearchEngine are the service providers):

Stream.of(
   mypkg.YahooSearchEngine.class.newInstance(), 
   mypkg.BingSearchEngine.class.newInstance()
)

while this naive approach is very naive (e.g, it even ignores the fact ServiceLoader#stream returns Stream<Provider<S>> rather than Stream<S> where S is the type of the implementing class), hopefully it gets the idea across.

My initial experiments to have a plugin that does that have failed :)

  1. When I use solver.addStmts(...) to add a set of generated statements mimicking the above appraoche, they do not result in YahooSearchEngine#search or BingSearchEngine#search being added to the call graph.
  2. As a sanity check, if I write the equivalent java code directly (as opposed to IR manipulations) and have it analysed by tai-e, it's capable of "following through the stream.of", and the call graph includes both YahooSearchEngine#search and BingSearchEngine#search.

Looking forward to your reply!

silverbullettt commented 11 months ago

Firstly, Tai-e can indeed simulate the behavior of certain APIs by synthesizing IR in pointer analysis. An example is ArrayModel, which we use to model System.arraycopy().

However, for your particular case, a more idiomatic and straightforward approach should be to handle reflection properly. I'm not familiar with ServiceLoader, but it seems to create instances (such as your various *SearchEngine instances) through reflection. So, the simplest way would be to make Tai-e capable of analyzing the reflection calls within ServiceLoader.

There are two ways to address this issue:

  1. Use Tai-e's solar (option: pta=...;reflection-inference:solar;...), which is a more powerful reflection analysis compared to the default option and may directly resolve instances created by ServiceLoader.
  2. You can explicitly inform Tai-e through a reflection log. To do this, you need to locate the actual reflection calls in the JDK for ServiceLoader (likely newInstance() calls), and then use a log to inform Tai-e (option: pta=...;reflection-log:refl.log;...). The format of reflection log is the same as TamiFlex. For reference, you can take a look at how refl.log is written in java-benchmarks.
zhangt2333 commented 11 months ago
  1. When I use solver.addStmts(...) to add a set of generated statements mimicking the above appraoche, they do not result in YahooSearchEngine#search or BingSearchEngine#search being added to the call graph.

How did you do that (I use solver.addStmts(...) to add a set of generated statements mimicking the above appraoche)?

Compared with the generic solution mentioned by @silverbullettt, this requires very careful modeling; you might also consider whether the modeling is wrong.

staslath commented 11 months ago

@silverbullettt , @zhangt2333 thanks for taking the time to respond!

After further debugging I came to learn that my using solver.addIgnoredMethod(...) along with solver.addStmts(method, ...) was likely the root cause for things not working as expected. In particular, I believed that addIgnoredMethod() would ignore the original IR of that method, and addStmts() would replace that IR with my own. However, the use of addIgnoredMethod(method) resulted in other plugins (e.g., lambda analysis) not properly handling the method at hand.

While debugging and trying out different things, I was wondering if there was a good way to access an enclosing class' members inside an InvokeHandler of its non-static inner class?

zhangt2333 commented 11 months ago

if there was a good way to access an enclosing class' members inside an InvokeHandler of its non-static inner class?

I'm afraid I haven't understood you well, could you provide an example?

staslath commented 11 months ago

@zhangt2333 , consider the following example:

public class Main {

  private String fileName ...; // populated from args

  private Stream<String> readFile() { ... } // uses the fileName member of the enclosing class

  private class AnimalFinder {
    public List<Animal> findAnimals() {
      return readFile().map(Class::forName).toList();
    }
  }

  public Main(String[] args) { ... }

  public static void main(String[] args) {
    Main main = new Main(args); // populates fileName, among other things
    List<Animal> animals = main.new AnimalFinder().findAnimals();
  }
}

I'd like to replace the IR of readFile(), such that I provide the file content myself, based on the paths I observe in fileName's points-to set.

However, if I create a model which derives from AbstractModel and has InvokeHandler for readFile(), I won't have the points-to set for fileName because it's not part of readFile()'s signature, and not even part of the class containing readFile(), only it's enclosing class - Main.

Is there a good way to access fileName's points-to set in such case?

zhangt2333 commented 11 months ago

In the method readFile, you can obtain the InstanceField (Pointer) fileName by the thisObj (recvObj) of readFile and the JField fileName. Then you can retrieve the PTS of filenName.