openrewrite / rewrite-analysis

OpenRewrite recipes for data flow analysis.
Apache License 2.0
8 stars 8 forks source link

Begin work on Global Data Flow API #16

Closed JLLeitschuh closed 1 year ago

JLLeitschuh commented 1 year ago

Proposal for how we will implement Global Data Flow

  1. Build an exhaustive flow graph for each method that includes all methods, from all arguments, to all arguments, and to all return values. Don't discriminate at all. This is a superset graph of actual data flow. It will include all possible paths. We do this because we are only allowed one pass at each file by the recipe scheduler, so we can't go back and re-visit files again. a. This step can use control flow to limit the possible set of expressions that are reachable, which can slightly reduce the search space.
  2. Connect this graph together, between each method, after finding all of the valid source nodes. This becomes the "exhaustive flow graph".
  3. Walk the graph from all sources, pruning all paths were {@link DataFlowSpec#isAdditionalFlowStep} returns false, and where {@link DataFlowSpec#isSink(DataFlowNode)} returns false.

This will, unfortunately, require that the entire repositories AST remain in memory during the computation of global data flow, as we need to first build the entire possible graph, then walk it, in memory, pruning paths.

Up side, is that this method will allow us to build this "exhaustive flow graph" in, from the recipe scheduler's perspective, a single pass of the entire AST.

It may be possible to, eventually, drop most of AST we are holding in memory after the scanning phase. But at this point, I haven't figured out a way to compute global data flow without holding the entire repositories AST in memory.