Seeking feedback on feature flag cleanup approach with the Piranha framework

aleksandr-zakshevskii-n26 commented 6 months ago

Hello team 👋,

I wanted to get your help to validate my approach for cleaning up obsolete feature flags using your amazing Piranha framework.

Background

We work with various Kotlin codebases where the handling of feature flags may vary by team. My goal was to develop a set of universal Piranha rules to cover all possible feature flag invocations. We are using Piranha Polyglot version.

Current Challenge

Here is one of the examples of how feature flags could be used:

enum class FeatureFlags(val flagName: String) {
    STALE_FLAG_FEATURE(“STALE_FLAG”),
    //...
}

fun isFlagEnabled() =
    featureService.isEnabled(FeatureFlags.STALE_FLAG_FEATURE.name)

fun someFunction() {
    if (isFlagEnabled()) {
        ....
    }
}

In the example above, a feature flag is a parameter to an enumeration item. However, feature flags can also be stored as an element of a sealed class or as a simple string. I found that the more patterns I want to capture, the more challenging it becomes to maintain the rules and edges.

Problems Faced

1. Complexity of Tree-Sitter Queries

Here is an example of tree-sitter query to capture an invocation call of a feature flag:

 featureService.isEnabled(FeatureFlags.STALE_FEATURE_FLAG)

The corresponding tree-sitter query to match the feature flag:

(call_expression
    (simple_identifier) @function_name
    (call_suffix (value_arguments
        (value_argument 
            (simple_identifier) @flag_name
        )
    )
) @call_expression

In case if a feature flag is wrapped with a constructor call, as follows:

 featureService.isEnabled(FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG))

The tree-sitter query becomes significantly more complex for visual perception.

2. Redundancy in Queries:

Let's consider two examples:

enum class FeatureFlag(val name: String) {
    STALE_FLAG(“stale_flag_name”)
}

the corresponding tree-sitter query will be:

(enum_entry
    (simple_identifier) @enum_item_name
    (value arguments [
        (value_argument
            (string_literal) @flag_name
         )
    ])
)

and

enum class FeatureFlag(val feature: FeatureFlagDetails) {
    STALE_FLAG(feature)
}

the corresponding tree-sitter query will be:

(enum_entry
    (simple_identifier) @enum_item_name
    (value arguments [
        (value_argument
            (simple_identifier) @flag_name
         )
    ])
)

As can be seen from the tree-sitter queries above - both queries differ only by a identifier part, but the enumeration part is exactly the same.

3. Repetition in flows between rules

The more rules it becomes - I noticed the same repetition patterns between them. For example if a flag is defined in a sealed class the possible flow of the rules could be the following

Screenshot 2024-05-27 at 22 13 26

However flag could be also defined using an enum, and the flow will be exactly the same.

Idea

My idea is to create a script that uses rule templates and graph-encoded relationships to generate rules.toml and edges.toml files. This approach aims to:

Simplify the process of creating and managing rules.
Reduce redundancy by reusing existing queries.
Provide a more intuitive and scalable way to handle feature flags.

Example

Here is an example of a feature flag invocation:

feature flag is defined as string literal
this flag referenced using an enumeration
enumeration item representing a feature flag is invoked inside is_flag_enabled boolean function
the function result is used in a boolean expression

As the result - it would be expected that:

enumeration item is removed
function declaration referencing enumeration is removed
invocation of the function is replaced with the corresponding boolean value

Representing the edges as directed graph

The graph represents rule templates and their relationships, where paths from starting nodes to leaves represent rule flows

The green nodes represent the starting points used for the initial feature flag extraction. Each path in the graph indicates a sequence of rule applications. These paths are constructed recursively from the starting node to the leaves. For example, for the path A -> B -> C, the following edges will be generated:

A -> B -> C
A -> B
A

To demonstrate this approach I've implemented a simple python script and configurations. Python script

The script is a proof of concept rather than a finished solution. It takes 2 configuration files as parameters: rule_templates.toml - contains tree-sitter rule templates edges_templates.toml - contains the directed graph, which stores the rules as nodes and relations between them, it is used to generate file edges.toml

This script produces the following files as the result: rules.toml - contains auto-generated rules which then applied to the tested files edges.toml - contains auto-generated edges

You can find all relevant files along with the tests in this PR

I would greatly appreciate your feedback and thoughts on this approach. Thank you for your time and consideration.

danieltrt commented 5 months ago

Hi @aleksandr-zakshevskii-n26,

I haven't had the chance to read through your PR thoroughly yet, but it looks quite interesting. I wanted to let you know that we are working on a different matching language to make syntax matching easier for most use cases. As you mentioned, tree-sitter queries can be quite challenging, even for straightforward tasks.

With concrete syntax matching, what you see is what you match. For example, the code snippet:

featureService.isEnabled(FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG))

can be matched straightforwardly with concrete syntax like this:

cs  :[var].isEnabled(:[wrapper](:[your_stale_flag]))

If you're interested in learning more, please check out this PR. Additionally, we have some documentation on concrete syntax, although it is somewhat outdated and not comprehensive. You can find it here.

Note however this matching mode is experimental.

mrhooray commented 3 months ago

@danieltrt thanks for the pointers.

Any recommendations on the following scenarios?

feature flag references that are not inlined, e.g.

val ff = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG)
featureService.isEnabled(ff)

feature flag references across function invocations and files, e.g.

// file 1
fun doSomething(val args: Args, val ff: FeatureFlagWrapper) {
if (featureService.isEnabled(ff)) {
// do something
} else {
// do something else
}
}
// file 2
val ff = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG)
doSomething(ff)

danieltrt commented 3 months ago

Hi @mrhooray !

For the first case, if your feature flag reference and the call to isEnabled are within the same scope, you can still get cleanups. For example:

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges

# Original code snippet
code = """
void some_method() {
    var ff = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG);
    if (featureService.isEnabled(ff) || x > 0) {
        // do something
    } else {
        // do something else!
    }
}
"""

# Define the rule to replace the method call
r1 = Rule(
    name="find_flag_reference",
    query="cs var :[reference] = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG);",
    replace_node="*",
    replace="",
)

r2 = Rule(
    name="propagate_changes",
    query="cs :[service].isEnabled(:[reference])", # Getting the reference from r1
    replace_node="*",
    replace="true",
    holes ={"reference"},
    is_seed_rule=False
)

# Define the edges for the rule graph.
# In this case, boolean_literal_cleanup is already defined for java [see src/cleanup_rules]
edge1 = OutgoingEdges("find_flag_reference", to=["propagate_changes"], scope="Method")
edge2 = OutgoingEdges("propagate_changes", to=["boolean_literal_cleanup"], scope="parent")
# Create Piranha arguments
piranha_arguments = PiranhaArguments(
    code_snippet=code,
    language="java",
    rule_graph=RuleGraph(rules=[r1, r2], edges=[edge1, edge2])
)

# Execute Piranha and print the transformed code
piranha_summary = execute_piranha(piranha_arguments)
print(piranha_summary[0].content)

For the second case, I'm not sure. If doSomething is a method that can take any FeatureFlagWrapper, cleaning up might be tricky since we can't be sure what flag is passed. Nor if it is safe to do it:

val ff = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG)
doSomething(ff)
val ff2 = FeatureFlagWrapper(FeatureFlags.NOT_A_STALE_FEATURE_FLAG)
doSomething(ff2)

mrhooray commented 3 months ago

Thanks for the response and the CST example, @danieltrt

For the 1st case, looks like captured nodes/values can be propagated to next set of rules - somehow I didn't get this from documentation but it is indeed very useful.
- What's the scope of propagation, within the same file or the same execution session?
For the 2nd case, you're right there could be scenarios with additional call sites. For the simpler case where there's only 1 call site with a stale flag, a generic solution seems to require some level of code inlining or cross file/scope capabilities for proper cleanup.

danieltrt commented 3 months ago

During an execution session, Piranha maintains a symbol table with all the captured nodes, allowing you to reuse their values in subsequent rule applications. To do this, you need to declare them as "holes" in the rule (holes = {"reference"})

If you're sure the captured value is only used once in the codebase, you can create a rule for inlining. You just need to find the method declaration first:

find_flag_reference = Rule(
    name="find_flag_reference",
    query="cs var :[reference] = FeatureFlagWrapper(FeatureFlags.STALE_FEATURE_FLAG);",
    replace_node="*",
    replace=""
)

find_usage = Rule( 
    name="find_usage",
    # This finds doSomething(ff)
    query="cs :[method_name](:[reference])", 
    holes={"reference"},
    replace_node="*",
    replace=""
)

find_method_decl_with_usage = Rule(
    name="find_method_decl_with_usage",
    # This finds doSomething's declaration. This is a match only rule, no replacement
    query="cs fun :[method_name](val :[wrapper]: FeatureFlagWrapper) { :[body+] }",  
    holes={"method_name"},
)

cleanup_actual_usages = Rule(
    name="cleanup_actual_usages",
    # Regular cleanup
    query="cs :[_].isEnabled(:[wrapper])", 
    replace_node="*",
    replace="true",
    holes={"wrapper"},
    is_seed_rule=False
)

edge1 = OutgoingEdges("find_flag_reference", to=["find_usage"], scope="Method")
# Specify you want to search in your entire code base, to find the method declaration
edge2 = OutgoingEdges("find_usage", to=["find_method_decl_with_usage"], scope="Global") 
# Find the actual usages of is_enabled within the method declaration from above
edge3 = OutgoingEdges("find_method_decl_with_usage", to=["cleanup_actual_usages"], scope="Method") 
edge4 = OutgoingEdges("cleanup_actual_usages", to=["boolean_literal_cleanup"], scope="Parent")

aleksandr-zakshevskii-n26 commented 2 months ago

Sorry for the late response. Thank you very much for your responses, I will finalize my approach and will get back to you. Best, Alex

uber / piranha