tconbeer / sqlfmt

sqlfmt formats your dbt SQL files so you don't have to
https://sqlfmt.com
Apache License 2.0
405 stars 17 forks source link
dbt formatter python sql

sqlfmt

PyPI Downloads Test

PyPI - Python Version Runs on Linux | MacOS | Windows

sqlfmt formats your dbt SQL files so you don't have to. It is similar in nature to black, gofmt, and rustfmt (but for SQL).

  1. sqlfmt promotes collaboration. An auto-formatter makes it easier to collaborate with your team and solicit contributions from new people. You will never have to mention (or argue about) code style in code reviews again.
  2. sqlfmt is fast. Forget about formatting your code, and spend your time on business logic instead. sqlfmt processes hundreds of files per second and only operates on files that have changed since the last run.
  3. sqlfmt works with Jinja. It formats the code that users look at, and therefore doesn't need to know anything about what happens after the templates are rendered.
  4. sqlfmt integrates with your workflow. As a CLI written in Python, it's easy to install locally on any OS and run in CI. Plays well with dbt, pre-commit, SQLFluff, VSCode, and GitHub Actions. sqlfmt powers the dbt Cloud IDE's Format button.

sqlfmt is not configurable, except for line length. It enforces a single style. sqlfmt maintains comments and some extra newlines, but largely ignores all indentation and line breaks in the input file.

sqlfmt is not a linter. It does not parse your code into an AST; it just lexes it and tracks a small subset of tokens that impact formatting. This lets us "do one thing and do it well:" sqlfmt is very fast, and easier to maintain and extend than linters that need a full SQL grammar.

For now, sqlfmt only works on select, delete, grant, revoke, and create function statements (which is all you need if you use sqlfmt with a dbt project). It is being extended to additional DDL and DML. Visit this tracking issue for more information.

Documentation

Please visit docs.sqlfmt.com for more information on Getting Started, Integrations, the sqlfmt Style, and an API Reference. Or keep reading for an excerpt from the full docs.

Installation

Try it first

Want to test out sqlfmt on a query before you install it? Go to sqlfmt.com to use the interactive, web-based version.

Install Using pipx (recommended)

sqlfmt is a pip-installable Python package listed on PyPI under the name shandy-sqlfmt. You should install it into a virtual environment, which pipx does automatically:

pipx install shandy-sqlfmt

To install with the jinjafmt extra (which will also install the Python code formatter, black):

pipx install shandy-sqlfmt[jinjafmt]

For more installation options, read the docs.

Getting Started

Other prerequisites

sqlfmt will not always produce the formatted output you might want. It might even break your SQL syntax. It is highly recommended to only run sqlfmt on files in a version control system (like git), so that it is easy for you to revert any changes made by sqlfmt. On your first run, be sure to make a commit before running sqlfmt.

There are certain situations where sqlfmt can be considered to be in Beta, or even more mature than that. Those are:

  1. Using sqlfmt to format select statements for one of the major dialects (PostgresSQL, MySQL, Snowflake, BQ, Redshift).

  2. Using sqlfmt to format a dbt project (which may also include jinja and some minimal DDL/DML, like grants, create function, etc.) for one of the major dialects.

However, there are other use cases where sqlfmt is very much alpha:

  1. Formatting some dialects that deviate from ANSI or Postgres, like T-SQL (SQLServer).

  2. Formatting other DDL (create table, insert, etc.) (sqlfmt attempts to be no-op on these statements as much as possible).

In these domains sqlfmt is nowhere near "feature complete" and caution is highly advised.

Using sqlfmt

To list commands and options:

sqlfmt --help

If you want to format all .sql and .sql.jinja files in your current working directory (and all nested directories), simply type:

$ sqlfmt .

If you don't want to format the files you have on disk, you can run sqlfmt with the --check option. sqlfmt will exit with code 1 if the files on disk are not properly formatted:

$ sqlfmt --check .

If you want to print a diff of changes that sqlfmt would make to format a file (but not update the file on disk), you can use the --diff option. --diff also exits with 1 on changes:

$ sqlfmt --diff .

For more commands, see the docs.

Configuring sqlfmt using pyproject.toml

Any command-line option for sqlfmt can also be set in a pyproject.toml file, under a [tool.sqlfmt] section header. Options passed at the command line will override the settings in the config file. See the docs for more information.

The jinjafmt extra

sqlfmt loves properly-formatted jinja, too.

See the docs for more information about using the jinjafmt extra or disabling jinja formatting.

Using sqlfmt with different SQL dialects

sqlfmt's rules are simple, which means it does not have to parse every single token in your query. This allows nearly all SQL dialects to be formatted using sqlfmt's default "polyglot" dialect, which requires no configuration.

The exception to this is ClickHouse, which is case-sensitive where other dialects are not. To prevent the lowercasing of function names, database identifiers, and aliases, use the --dialect clickhouse option when running sqlfmt. For example,

$ sqlfmt . --dialect clickhouse

This can also be configured using the pyproject.toml file:

[tool.sqlfmt]
dialect = "clickhouse"

Note that with this option, sqlfmt will not lowercase most non-reserved keywords, even common ones like sum or count. See (and please join) this discussion for more on this topic.

Integrations

sqlfmt plays nicely with other analytics engineering tools. For more information, see the docs.

dbt

sqlfmt was built for dbt, so only minimal configuration is required. We recommend excluding your target and dbt_packages directories from formatting. You can do this with the command-line --exclude option, or by setting exclude in your pyproject.toml file:

[tool.sqlfmt]
exclude=["target/**/*", "dbt_packages/**/*"]

Other Integrations

Config for other integrations is detailed in the docs linked below:

The sqlfmt style

The only thing you can configure with sqlfmt is the desired line length of the formatted file. You can do this with the --line-length or -l options. The default is 88.

sqlfmt borrows elements from well-accepted styles from other programming languages. It places opening brackets on the same line as preceding function names (like black for python and 1TBS for C). It indents closing brackets to the same depth as the opening bracket (this is extended to statements that must be closed, like case and end).

The sqlfmt style is as simple as possible, with little-to-no special-casing of formatting concerns. While at first blush, this may not create a format that is as "nice" or "expressive" as hand-crafted indentation, over time, as you grow accustomed to the style, formatting becomes transparent and the consistency will allow you to jump between files, projects, and even companies much faster.

Read More

Why lowercase?

Because SQL is code! But there are other good reasons too.

Why trailing commas?

Using trailing commas follows the convention of every other written language and programming language. But wait, there's more.

Contributing

Code style: black Checked with mypy Maintainability Test Coverage

Providing Feedback

We'd love to hear from you! Open an Issue to request new features, report bad formatting, or say hello.

Setting up Your Dev Environment and Running Tests

  1. Install Poetry v1.2 or higher if you don't have it already. You may also need or want pyenv, make, and gcc. A complete setup from a fresh install of Ubuntu can be found here.
  2. Clone this repo into a directory (let's call it sqlfmt), then cd sqlfmt.
  3. Use poetry install --all-extras --sync to install the project (editable) and its dependencies (including the jinjafmt and sqlfmt_primer extras) into a new virtual env.
  4. Use poetry shell to spawn a subshell.
  5. Type make to run all tests and linters, or run pytest, black, flake8, isort, and mypy individually.

Updating primer repos to reflect formatting changes

  1. Make sure all changes are committed to sqlfmt.
  2. Check out main in the repo and make sure you pull changes locally.
  3. Check out the unformatted tag in the repo with git checkout -b chore/apply-abc123 unformatted where abc123 is the hash of the most recent sqlfmt commit (from 1).
  4. Run sqlfmt against the working tree, then git add . and git commit -m "chore: apply sqlfmt abc123".
  5. We will have conflicts with main that we want to ignore, so merge main into this branch, ignoring anything on main: git merge -s ours main.
  6. Push and open a PR; squash and merge. Grab the commit SHA.
  7. Paste the commit SHA as a ref into primer.py.
  8. Run sqlfmt_primer -k to clear the cache, then update the stats in primer.py to match the results.