sbdchd / squawk

🐘 linter for PostgreSQL, focused on migrations
https://squawkhq.com
GNU General Public License v3.0
579 stars 38 forks source link

Add option to ignore "create table" #364

Closed wosc closed 1 month ago

wosc commented 2 months ago

Our alembic-based setup includes "starter" migrations that create the tables. So if I alembic upgrade head --sql | squawk, this dumps out all migrations, but I get no analysis whatsoever, since squawk seems to decide "hey all these tables are new, so there can be no conflicts". So, could we have a squawk option to... not do that? Or am I holding things wrong, somehow?

Example ``` $ squawk < 553e4856ccbb CREATE TABLE example ( id UUID NOT NULL, PRIMARY KEY (id) ); COMMIT; BEGIN; -- Running upgrade 553e4856ccbb -> 89a2a1ae92fb ALTER TABLE example ADD COLUMN name VARCHAR; CREATE INDEX ix_example_name ON example (name); COMMIT; EOF ``` says `Found 0 issues in 1 file` instead of `require-concurrent-index-creation`
sbdchd commented 1 month ago

I think the usual approach is that you use squawk against each incremental migration

Curious your use case, why run it over everything instead of the most recent migration?

wosc commented 1 month ago

Maybe I'm just really missing something (very sorry if so): How do you tell alembic to output the sql of individual migrations? (The only way I could think of right now is to parse the not very machine friendly output of alembic history to get the separate revision ids, and then alembic upgrade rev1:rev2 --sql individually)

sbdchd commented 1 month ago

I’m not that familiar with alembic but for Django I use the following script to dump the sql for the migrations that have changed in a PR:

https://github.com/chdsbd/kodiak/blob/master/web_api/s/squawk.py

Could you git diff the migrations directory and then dump the sql for any files that have changed and run squawk against that sql?

wosc commented 1 month ago

Ah I see, in your example you parse the migration filenames to get the individual revision ids. That would certainly be doable; or maybe I'll go the route of postprocessing the collected SQL output and stripping any CREATE TABLE statements before passing to squawk. Thanks for explaining!

wosc commented 1 month ago

If anyone is interested, this is the code I ended up with:

def alembic_upgrade_offline(buffer)
    config = alembic.config.Config('/path/to/alembic.ini')
    script = alembic.script.ScriptDirectory.from_config(config)
    with EnvironmentContext(config=config, script=script, as_sql=True) as context:
        context.configure(
            url='postgresql://unused',
            literal_binds=True,
            dialect_ops={'paramstyle': 'named'}
            fn=lambda rev, context: script._upgrade_revs('head', rev),
            transaction_per_migration=True,
            output_buffer=buffer,
        )
        context.run_migrations()

def test_lint_migrations_with_squawk():
    buffer = io.StringIO()
    alembic_upgrade_offline(buffer)
    sql = [x.strip() for x in buffer.getvalue().split(';')]
    # squawk ignores any statements that refer to a newly created table
    sql = [x for x in sql if not x.startswith('CREATE TABLE')]
    sql = ';\n'.join(sql)

    proc = subprocess.Popen(['squawk'], stdout=PIPE, stderr=PIPE, stdin=PIPE)
    stdout, stderr = proc.communicate(sql.encode('utf-8'))
    if proc.returncode:
        output = stdout.decode('utf-8') + stderr.decode('utf-8')
        assert False, 'squawk returned errors:\n' + output