uber / piranha

A tool for refactoring code related to feature flag APIs
Apache License 2.0
2.27k stars 190 forks source link

Seeking feedback on feature flag cleanup approach with the Piranha framework #668

Open aleksandr-zakshevskii-n26 opened 3 months ago

aleksandr-zakshevskii-n26 commented 3 months ago

Hello team đź‘‹,

I wanted to get your help to validate my approach for cleaning up obsolete feature flags using your amazing Piranha framework.

Background

We work with various Kotlin codebases where the handling of feature flags may vary by team. My goal was to develop a set of universal Piranha rules to cover all possible feature flag invocations. We are using Piranha Polyglot version.

Current Challenge

Here is one of the examples of how feature flags could be used:

enum class FeatureFlags(val flagName: String) {
    STALE_FLAG_FEATURE(“STALE_FLAG”),
    //...
}

fun isFlagEnabled() =
    featureService.isEnabled(FeatureFlags.STALE_FLAG_FEATURE.name)

fun someFunction() {
    if (isFlagEnabled()) {
        ....
    }
}

In the example above, a feature flag is a parameter to an enumeration item. However, feature flags can also be stored as an element of a sealed class or as a simple string. I found that the more patterns I want to capture, the more challenging it becomes to maintain the rules and edges.

Problems Faced

1. Complexity of Tree-Sitter Queries

Here is an example of tree-sitter query to capture an invocation call of a feature flag:

 featureService.isEnabled(FeatureFlags.STALE_FEATURE_FLAG)

The corresponding tree-sitter query to match the feature flag:

(call_expression
    (simple_identifier) @function_name
    (call_suffix (value_arguments
        (value_argument 
            (simple_identifier) @flag_name
        )
    )
) @call_expression

In case if a feature flag is wrapped with a constructor call, as follows:

 featureService.isEnabled(FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG))

The tree-sitter query becomes significantly more complex for visual perception.

2. Redundancy in Queries:

Let's consider two examples:

enum class FeatureFlag(val name: String) {
    STALE_FLAG(“stale_flag_name”)
}

the corresponding tree-sitter query will be:

(enum_entry
    (simple_identifier) @enum_item_name
    (value arguments [
        (value_argument
            (string_literal) @flag_name
         )
    ])
)

and

enum class FeatureFlag(val feature: FeatureFlagDetails) {
    STALE_FLAG(feature)
}

the corresponding tree-sitter query will be:

(enum_entry
    (simple_identifier) @enum_item_name
    (value arguments [
        (value_argument
            (simple_identifier) @flag_name
         )
    ])
)

As can be seen from the tree-sitter queries above - both queries differ only by a identifier part, but the enumeration part is exactly the same.

3. Repetition in flows between rules

The more rules it becomes - I noticed the same repetition patterns between them. For example if a flag is defined in a sealed class the possible flow of the rules could be the following

Screenshot 2024-05-27 at 22 13 26

However flag could be also defined using an enum, and the flow will be exactly the same.

Idea

My idea is to create a script that uses rule templates and graph-encoded relationships to generate rules.toml and edges.toml files. This approach aims to:

Example

Screenshot 2024-05-27 at 19 07 09

Here is an example of a feature flag invocation:

As the result - it would be expected that:

Representing the edges as directed graph

The graph represents rule templates and their relationships, where paths from starting nodes to leaves represent rule flows

Screenshot 2024-05-27 at 19 12 20

The green nodes represent the starting points used for the initial feature flag extraction. Each path in the graph indicates a sequence of rule applications. These paths are constructed recursively from the starting node to the leaves. For example, for the path A -> B -> C, the following edges will be generated:

A -> B -> C
A -> B
A

To demonstrate this approach I've implemented a simple python script and configurations. Python script

The script is a proof of concept rather than a finished solution. It takes 2 configuration files as parameters: rule_templates.toml - contains tree-sitter rule templates edges_templates.toml - contains the directed graph, which stores the rules as nodes and relations between them, it is used to generate file edges.toml

This script produces the following files as the result: rules.toml - contains auto-generated rules which then applied to the tested files edges.toml - contains auto-generated edges

You can find all relevant files along with the tests in this PR


I would greatly appreciate your feedback and thoughts on this approach. Thank you for your time and consideration.

danieltrt commented 3 months ago

Hi @aleksandr-zakshevskii-n26,

I haven't had the chance to read through your PR thoroughly yet, but it looks quite interesting. I wanted to let you know that we are working on a different matching language to make syntax matching easier for most use cases. As you mentioned, tree-sitter queries can be quite challenging, even for straightforward tasks.

With concrete syntax matching, what you see is what you match. For example, the code snippet:

featureService.isEnabled(FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG))

can be matched straightforwardly with concrete syntax like this:

cs  :[var].isEnabled(:[wrapper](:[your_stale_flag]))

If you're interested in learning more, please check out this PR. Additionally, we have some documentation on concrete syntax, although it is somewhat outdated and not comprehensive. You can find it here.

Note however this matching mode is experimental.

mrhooray commented 1 month ago

@danieltrt thanks for the pointers.

Any recommendations on the following scenarios?

danieltrt commented 1 month ago

Hi @mrhooray !

For the first case, if your feature flag reference and the call to isEnabled are within the same scope, you can still get cleanups. For example:

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges

# Original code snippet
code = """
void some_method() {
    var ff = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG);
    if (featureService.isEnabled(ff) || x > 0) {
        // do something
    } else {
        // do something else!
    }
}
"""

# Define the rule to replace the method call
r1 = Rule(
    name="find_flag_reference",
    query="cs var :[reference] = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG);",
    replace_node="*",
    replace="",
)

r2 = Rule(
    name="propagate_changes",
    query="cs :[service].isEnabled(:[reference])", # Getting the reference from r1
    replace_node="*",
    replace="true",
    holes ={"reference"},
    is_seed_rule=False
)

# Define the edges for the rule graph.
# In this case, boolean_literal_cleanup is already defined for java [see src/cleanup_rules]
edge1 = OutgoingEdges("find_flag_reference", to=["propagate_changes"], scope="Method")
edge2 = OutgoingEdges("propagate_changes", to=["boolean_literal_cleanup"], scope="parent")
# Create Piranha arguments
piranha_arguments = PiranhaArguments(
    code_snippet=code,
    language="java",
    rule_graph=RuleGraph(rules=[r1, r2], edges=[edge1, edge2])
)

# Execute Piranha and print the transformed code
piranha_summary = execute_piranha(piranha_arguments)
print(piranha_summary[0].content)

For the second case, I'm not sure. If doSomething is a method that can take any FeatureFlagWrapper, cleaning up might be tricky since we can't be sure what flag is passed. Nor if it is safe to do it:

val ff = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG)
doSomething(ff)
val ff2 = FeatureFlagWrapper(FeatureFlags.NOT_A_STALE_FEATURE_FLAG)
doSomething(ff2)
mrhooray commented 1 month ago

Thanks for the response and the CST example, @danieltrt

danieltrt commented 1 month ago

During an execution session, Piranha maintains a symbol table with all the captured nodes, allowing you to reuse their values in subsequent rule applications. To do this, you need to declare them as "holes" in the rule (holes = {"reference"})


If you're sure the captured value is only used once in the codebase, you can create a rule for inlining. You just need to find the method declaration first:

find_flag_reference = Rule(
    name="find_flag_reference",
    query="cs var :[reference] = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG);",
    replace_node="*",
    replace=""
)

find_usage = Rule( 
    name="find_usage",
    # This finds doSomething(ff)
    query="cs :[method_name](:[reference])", 
    holes={"reference"},
    replace_node="*",
    replace=""
)

find_method_decl_with_usage = Rule(
    name="find_method_decl_with_usage",
    # This finds doSomething's declaration. This is a match only rule, no replacement
    query="cs fun :[method_name](val :[wrapper]: FeatureFlagWrapper) { :[body+] }",  
    holes={"method_name"},
)

cleanup_actual_usages = Rule(
    name="cleanup_actual_usages",
    # Regular cleanup
    query="cs :[_].isEnabled(:[wrapper])", 
    replace_node="*",
    replace="true",
    holes={"wrapper"},
    is_seed_rule=False
)

edge1 = OutgoingEdges("find_flag_reference", to=["find_usage"], scope="Method")
# Specify you want to search in your entire code base, to find the method declaration
edge2 = OutgoingEdges("find_usage", to=["find_method_decl_with_usage"], scope="Global") 
# Find the actual usages of is_enabled within the method declaration from above
edge3 = OutgoingEdges("find_method_decl_with_usage", to=["cleanup_actual_usages"], scope="Method") 
edge4 = OutgoingEdges("cleanup_actual_usages", to=["boolean_literal_cleanup"], scope="Parent") 
aleksandr-zakshevskii-n26 commented 1 week ago

Sorry for the late response. Thank you very much for your responses, I will finalize my approach and will get back to you. Best, Alex