Consider adding `eugene lint` command

kaaveland commented 6 months ago

With #36 we're introducing a dependency to pg_query.rs, which gives us the full postgres SQL parser. It would be interesting to see if we can write some lint rules that do not depend on running the postgres server for some of the hints we currently provide. Here's a list of things we may be able to warn about using the parse tree from pg_query:

Adding constraints that aren't NOT VALID:
Altering a column to NOT NULL
Creating indexes without CONCURRENTLY
Type changes
Introducing a JSON column

It may also be that we can do some manual inspection of the parse tree to figure out that some statements take dangerous locks, then warn about that. However, many of these will give us false positives that we can't statically verify now, so we probably want to support some sort of comment syntax to let eugene ignore the statement, eg.

-- We already have a check (col is not null)
-- eugene: ignore alter_to_not_null
alter table documents set col not null;

kaaveland commented 6 months ago

I think we can probably do all of the hints we have right now, if we accept some false positives and we make some of the hints optional (eg. we can look for SET [LOCAL] lock_timout, we can figure out that ALTER TABLE takes AccessExclusiveLock, that CREATE INDEX takes ShareLock, and that some statements ran after and we can let the user ignore f. ex the lock timeout thing via parameters or configuration. I think it's worth taking a stab at this. It would make it really easy to run the hints in a CI/CD pipeline, or a commit-hook, which may make eugene much easier to start using.

kaaveland commented 6 months ago

Been taking a stab at working with the parse trees from pg_query, and I think it's very clunky. The grammar is huge and because of the structure, there's so much if let going on. I think the best way forward here is to translate the parse tree to a smaller grammar that contains only the things we care about, then make rules for that.

kaaveland commented 6 months ago

Here's a small example for accessing the type of an alter table ... alter column .. set type kind of statement, which is incomplete:

fn alters_table_lints(node: &pg_query::NodeRef) -> Vec<String> {
    if let pg_query::NodeRef::AlterTableStmt(stmt) = node {
        let next = &stmt.cmds;
        next.iter()
            .filter_map(|child| {
                let node_ref = child.node.as_ref()?;
                is_alters_type_cmd(&node_ref.to_ref())
            })
            .collect()
    } else {
        Vec::new()
    }
}

fn is_alters_type_cmd(node: &pg_query::NodeRef) -> Option<String> {
    if let pg_query::NodeRef::AlterTableCmd(cmd) = node {
        let def_node = cmd.def.as_ref()?.node.as_ref()?;
        if let pg_query::NodeRef::ColumnDef(def) = def_node.to_ref() {
            if def.type_name.is_some() {
                Some(cmd.name.to_string())
            } else { None }
        } else {
            None
        }
    } else {
        None
    }
}

kaaveland commented 6 months ago

This is huge, and perhaps not the right place to start: https://www.postgresql.org/docs/current/sql-altertable.html

I have a 130 line match now, and only barely touching the surface of alter table ... with column focus.

kaaveland commented 6 months ago

Like, what goes on the end of set default or add constraint check can be anything and it's really a lot of work to do anything like what eugene trace can do here.

kaaveland commented 6 months ago

The mvp here is to:

Discover from context which objects in the script that are visible to other transactions
Emit a lint warning if we lock those without lock timeout
The nonconcurrent create index lint
Setting a column to not null
Changing the type of a column
Adding a json column

kaaveland commented 6 months ago

Currently looks feasible to add useful stuff here, although it will be more prone to false positives and there's stuff that we're bound to miss that eugene trace can pick up. Because of the difference in information that we receive here, I'm not sure we should be reusing the Hint struct and the associated information, although I think it is probably a good idea to reuse the identifiers and names/headlines of the lints/hints as much as we can.

kaaveland / eugene

Consider adding `eugene lint` command #37