wseaton / sqloxide

Python bindings for sqlparser-rs
MIT License
164 stars 18 forks source link

AST edit for container type #268

Open psukys opened 1 week ago

psukys commented 1 week ago

In example, as I understood, edits are applied only on textual/string data, can this be enhanced to other built-in types, such as lists? For example:

import sqloxide

query = """
with my_cte2 as (
    select 1
),
my_cte1 as (
    select 2
)
select *
from my_cte1
join my_cte2
"""

def mutator(node):
    if "cte_tables" in node.keys():
        node["cte_tables"] = list(sorted(node["cte_tables"], key=lambda x: x["alias"]["name"]))
    return node

ast = sqloxide.parse_sql(query, dialect="postgres")
sqloxide.mutate_expressions(ast, mutator)
wseaton commented 1 week ago

edits are applied only on textual/string data, can this be enhanced to other built-in types, such as lists

Edits can be done on any pythonized type, they should work fine. For certain edits you may need to edit the raw AST instead of using a mutator, the mutator is based on visitor functions that trigger only on certain nodes.

So I think the issue in this case is that cte_tables is basically in the outer scope since it's attached to root of Query, and is not visible when using a visitor to walk the expressions in an AST (which can only see expressions).

We would need to implement a mutate_query function that uses the associated visitor: https://docs.rs/sqlparser/latest/sqlparser/ast/trait.VisitorMut.html#method.post_visit_query, then you'd have full access to cte_tables, but this type of edit is not that different than just modifying the raw AST.

psukys commented 1 week ago

@wseaton true, modifying AST directly works well, my use case was mostly around finding the node, which would get extensive (but can also be wrapped):

...
    new_ast = copy.deepcopy(ast)
    buffer = [new_ast]
    while buffer != []:
        new_buffer = []
        for item in buffer:
            if isinstance(item, list):
                new_buffer.extend(item)
            elif isinstance(item, dict):
                for key in item:
                    if key == "cte_tables":
                        item["cte_tables"] = list(sorted(item["cte_tables"], key=lambda x: x["alias"]["name"]["value"]))
                    elif isinstance(item[key], list):
                        new_buffer.extend(item[key])
                    elif isinstance(item[key], dict):
                        new_buffer.append(item[key])
                    # else don't care for primitive types
        buffer = new_buffer
    return new_ast

maybe I could look into adding mutate_query?

wseaton commented 1 week ago

Would exposing a binding to https://docs.rs/sqlparser/latest/sqlparser/ast/fn.visit_statements_mut.html work? You'd still need to match on statement type equals SELECT, but that should let you mutate the order of the CTEs.

wseaton commented 1 week ago

@psukys here's the equivalent rust code as a PoC:

use std::ops::ControlFlow;

use sqlparser::{
    ast::{visit_statements_mut, Statement},
    dialect::GenericDialect,
    parser::Parser,
};

fn main() {
    let sql =
        "with my_cte2 as ( select 1 ), my_cte1 as ( select 2 ) select * from my_cte1 join my_cte2 ";
    let mut statements = Parser::parse_sql(&GenericDialect {}, sql).unwrap();

    visit_statements_mut(&mut statements, |stmt| {
        if let Statement::Query(query) = stmt {
            if let Some(with) = &mut query.with {
                let _ = &with.cte_tables.sort_by(|a, b| a.alias.cmp(&b.alias));
            }
        }
        ControlFlow::<()>::Continue(())
    });

    assert_eq!(
        &statements[0].to_string(),
        "WITH my_cte1 AS (SELECT 2), my_cte2 AS (SELECT 1) SELECT * FROM my_cte1 JOIN my_cte2"
    );
}