sql-formatter-org / sql-formatter

A whitespace formatter for different query languages
https://sql-formatter-org.github.io/sql-formatter/
MIT License
2.35k stars 403 forks source link

[FORMATTING] Space between type and options removed in nested BigQuery fields #619

Open Ewan-Keith opened 1 year ago

Ewan-Keith commented 1 year ago

When formatting BigQuery DDL statement with OPTIONS on a Struct field, any space between the type definition and the Options definition is removed, resulting in an invalid statement.

Input data

CREATE TABLE `project.dataset.format_test`
(
 str STRING OPTIONS(description="A string field."),
 int INT64  OPTIONS(description="An int64 field."),
 record STRUCT<nested INT64 OPTIONS(description="a struct field")> OPTIONS(description="a struct")
)

Expected Output

CREATE TABLE
  `project.dataset.format_test` (
    str STRING OPTIONS(description = "A string field."),
    int INT64 OPTIONS(description = "An int64 field."),
    record STRUCT<nested INT64 OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
  )

Actual Output

CREATE TABLE
  `project.dataset.format_test` (
    str STRING OPTIONS(description = "A string field."),
    int INT64 OPTIONS(description = "An int64 field."),
    record STRUCT<nested INT64OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
  )

Usage

To run this I've stored the input in a file called example-sql.sql and am running:

> sql-formatter --v
12.2.3

> sql-formatter -l bigquery example-sql.sql
CREATE TABLE
  `project.dataset.format_test` (
    str STRING OPTIONS(description = "A string field."),
    int INT64 OPTIONS(description = "An int64 field."),
    record STRUCT<nested INT64OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
  )
nene commented 1 year ago

Thanks for reporting. Definitely a bug.

I remember that the part that deals with this type-parameter formatting was pretty ugly and error-prone. Perhaps there's a simple fix I can make, perhaps not. More likely not. Will look into this.

In the mean time, perhaps you'd like to try out a new SQL formatting tool that I've written, which specifically includes full BigQuery support: https://github.com/nene/prettier-plugin-sql-cst

It formats the SQL somewhat differently, but at least it wont output invalid code. For your SQL it produces the following:

CREATE TABLE `project.dataset.format_test` (
  str STRING OPTIONS (description = "A string field."),
  int INT64 OPTIONS (description = "An int64 field."),
  record STRUCT<nested INT64 OPTIONS (description = "a struct field")> OPTIONS (
    description = "a struct"
  )
);

I'm not really sure whether it would be better to format options as OPTIONS(...) or OPTIONS (...). Or which style is more common in BigQuery community. Open to suggestions.