sql-formatter-org / sql-formatter

A whitespace formatter for different query languages
https://sql-formatter-org.github.io/sql-formatter/
MIT License
2.31k stars 399 forks source link

Feature Request: SQL in SQL prettification #680

Open innermatrix opened 9 months ago

innermatrix commented 9 months ago

Describe the Feature

I want formatting of SQL inside function bodies written in quoted stings

Why do you want this feature?

If you are writing SQL functions, it's pretty common to have SQL embedded inside SQL; for example (straight out of Postgres documentation):

CREATE FUNCTION dup(int) RETURNS dup_result
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;

In this case, $$ is a string quoting delimiter, and SELECT $1, CAST($1 AS text) || ' is text' is a quoted string.

The issue is that most SQL formatters do not recurse into these quoted string function bodies, so the function declaration is prettified but the function implementation remains ugly. Here's what I get if I run the above through prettier with prettier-plugin-sql:

create function dup (int) returns dup_result as $$ SELECT $1, CAST($1 AS text) || ' is text' $$ language sql;

What I wish I got would be more like

create function dup (int) returns dup_result as $$ 
    select
        $1,
        cast($1 as text)||' is text'
$$ language sql;

(Which is what I get if I manually run the function body through prettier-plugin-sql and then paste it inside $$ delimiters.)

nene commented 9 months ago

It's a feature that would be definitely nice to have. However, there are several obstacles to implementing it.

See readme for information about another formatting library, which does support that, though not yet for Postgres. It implements a similar feature for BigQuery, and makes use of Prettier API to e.g. format Python code inside strings.

innermatrix commented 9 months ago

When it comes to the issue of "what is in this quoted string", I think that can be addressed in a well-defined way, since a function definition is going to have a language foobar either before or after the quoted string. (And when language declaration is missing from the function definition, then the default is also well-defined, either because it's specified the database engine — for example, Postgres specifies the default language to be sql — or because it can be provided as a config option). This extends to other custom languages, such as Python or JS, which are also identified by specific language foobar declarations.

In my particular case, I would be happy to change every $$ to $sql$ or $plpgsql$ if that would get me consistent function body formatting, but I honestly don't think that would even be necessary.

You are right that if you implemented this, my next request would be formatting plpgsql. I am happy to hear you are working on another another library that supports this better (though Postgres is my main use case)! However, it sounds like that corner of the ecosystem has not yet matured to the point where it is useful to me — I am using prettier v3 (which is incompatible with prettier-plugin-sql-cst) and prettier-plugin-embed which (currently) only knows how to call through to prettier-plugin-sql.

So I think I will continue ignoring this problem for now, because I think that what needs to happen is for prettier-plugin-sql-cst to gain prettier v3 compat, then for prettier-plugin-embed to gain support for prettier-plugin-sql-cst, and on top of that prettier-plugin-sql-cst would need to gain support for plpgsql, none of which are imminent 🙂

Thanks for explaining the lay of the land!

nene commented 9 months ago

The default language case actually complicates this problem even more, as then we'd need to detect whether the string occurs inside a CREATE FUNCTION statement, which is not a trivial thing, because the formatter doesn't really understand the context.

The good news is that Prettier 3 support in prettier-plugin-sql-cst is just around the corner, and when that happens, integration with the embed plugin should similarly be a in short order. But yeah... Postgres support, especially full pl/pgsql is quite a bit further.

innermatrix commented 9 months ago

I guess I assumed that the formatter has access to enough of the AST to be able to tell when it's inside a create function

But even so, I'd be happy to change my $$ tags to indicate the language; that's already what I am used to from JS tagged templates anyway. 🙂

nene commented 9 months ago

Asking users to modify their SQL to suit the formatter should really be the last resort. I think there are better approaches to make it work for most cases. Like detecting whether a string is followed by LANGUAGE SQL is not that hard. Not sure though how large portion of use-cases that will cover.

FelixZY commented 2 months ago

While this issue mainly focuses on sql-in-sql, I want to note that dollar quoted strings can also be used e.g. for comments:

comment on schema "public" is $$
  Here is my comment.
$$;

There are a lot of tools that rely on special comment syntax (e.g. Supabase and Postgraphile) that may break if the comment formatting changes!


That said, I really do want formatting of dollar quoted sql - that's why I'm here to begin with. While not perfect, perhaps an acceptable MVP to get the feature started would be a magical opt-in comment (similarly to other tools such as eslint), e.g.

create or replace function "foo" () returns text as $$
  -- sql-formatter-optin dollar-quoted-sql
  select 'bar'::text;
$$ language sql stable;
nene commented 1 month ago

FYI, this feature is unlikely to get implemented as I'm no more doing any real feature development on this library and instead concentrating on prettier-plugin-sql-cst.

If you really want this feature in SQL Formatter, you should consider writing a pull request. But I admit that it's no small feat to pull this off - SQL Formatter is pretty limited in it's architecture. It would be simpler to instead contribute that feature to prettier-plugin-sql-cst, which already implements very similar feature for BigQuery. Or just raise a feature request there and I'm more likely to implement it.