sql-formatter-org / sql-formatter

A whitespace formatter for different query languages
https://sql-formatter-org.github.io/sql-formatter/
MIT License
2.23k stars 392 forks source link

Error thrown parsing fairly long (but valid) postgresql statement #725

Closed samdeanpinknews closed 3 months ago

samdeanpinknews commented 3 months ago

Describe the bug

When running this code:

const formatted = format(query, {
        language: 'postgresql',
        tabWidth: 2,
        keywordCase: 'upper',
        linesBetweenQueries: 2,
        params: parameters,
      });

Where the parameters are [] and the query is

WITH "vars" AS (SELECT
        TIMESTAMP '2024-03-07' AS range_end,
        TIMESTAMP '2024-03-03' AS range_start,
        TIMESTAMP '2024-03-07' > now() AS end_is_future), "ranked" AS (SELECT
        id,
        ROW_NUMBER() OVER (PARTITION BY "publisherId" ORDER BY first_live_date DESC NULLS LAST) as rank
      FROM story
      WHERE first_live_date IS NOT NULL AND first_live_date > 0), "story_old_date_raw" AS (SELECT
        id,
        LEAD(first_live_date, 2) OVER (PARTITION BY "publisherId" ORDER BY first_live_date ASC) AS stop_date
      FROM story
      WHERE first_live_date IS NOT NULL AND first_live_date > 0), "story_dates" AS (SELECT
        "story"."id" AS id,
        DATE_TRUNC('hour', TO_TIMESTAMP("story"."first_live_date" / 1000)) AS start,
        DATE_TRUNC('hour', TO_TIMESTAMP(stop_date / 1000) + INTERVAL '30 MINUTE') AS old
      FROM story
      INNER JOIN story_old_date_raw ON "story"."id" = story_old_date_raw.id), "values" AS (SELECT
        "story"."id" AS id,
    -- Get the start values, defaulting to 0. Everything starts at 0.
        COALESCE(s.unique_viewers, 0) AS start_unique_viewers,
    -- The end values are whichever is soonest, the end point, or now()
        CASE 
          WHEN end_is_future THEN l.unique_viewers
          ELSE e.unique_viewers
        END AS end_unique_viewers,
    -- The old values are hardest, they can lie either before, during or after the from/to date range
    -- or, they just might not exist yet (a story might not yet be old)
        CASE
          WHEN story_dates.old < range_start THEN COALESCE(s.unique_viewers, 0)
          WHEN (story_dates.old > range_end OR story_dates.old IS NULL) AND end_is_future = true THEN l.unique_viewers
          WHEN (story_dates.old > range_end OR story_dates.old IS NULL) AND end_is_future = false THEN e.unique_viewers
          ELSE COALESCE(o.unique_viewers, e.unique_viewers)
        END AS old_unique_viewers,
    -- Just report the latest values as they come
        l.unique_viewers AS latest_unique_viewers
      FROM story
      INNER JOIN vars ON true
      INNER JOIN story_dates ON story_dates.id = story.id
      LEFT JOIN "story_snapshots_hourly" "s" ON s."storyId" = "story"."id" AND "s"."timestamp" = range_start
      LEFT JOIN "story_snapshots_hourly" "e" ON e."storyId" = "story"."id" AND "e"."timestamp" = range_end
      LEFT JOIN "story_snapshots_hourly" "o" ON o."storyId" = "story"."id" AND "o"."timestamp" = story_dates.old
      INNER JOIN "story_snapshot_latest" "l" ON "l"."storyId" = "story"."id"
      WHERE "story"."first_live_date" > 0 AND TO_TIMESTAMP("story"."first_live_date" / 1000) < range_end) SELECT COUNT(DISTINCT("story"."id")) AS "cnt" FROM "story" "story" LEFT JOIN "ranked" "ranked" ON ranked.id = "story"."id"  INNER JOIN "publisher" "publisher" ON "publisher"."id"="story"."publisherId"  INNER JOIN "values" "values" ON values.id = "story"."id" WHERE values.end_unique_viewers > values.start_unique_viewers

Expected behavior

The SQL to be formatted, and no error to be thrown

Actual behavior

An error was thrown:

Error: Parse error at token: . at line 43 column 347
Unexpected PROPERTY_ACCESS_OPERATOR token: {"type":"PROPERTY_ACCESS_OPERATOR","raw":".","text":".","start":2893}. Instead, I was expecting to see one of the following:
...

FYI: The character at 43:347 is a ., but it's no different to all the other .s on that line, and the query runs absolutely fine.

Usage

nene commented 3 months ago

Thanks for reporting. The problem is with the keyword VALUES. The same problem happens in this shorter SQL query:

SELECT values.id FROM my_table AS "values"

The formatter expects values to be the beginning of a VALUES (...) clause, but here it's used as a table name.

Currently the workaround is to simply quote it.

It's likely something that isn't easy to fix, because it stems from the architectural problems in the formatter.

As you're using PostgreSQL, I can suggest you instead try prettier-plugin-sql-cst, which formats this query just fine. It doesn't yet have full PostgreSQL support, but if you're mainly formatting queries, it should work better than SQL Formatter.

samdeanpinknews commented 3 months ago

Ah, that makes sense.

It's probably best practice not to use keywords as table names so I think I'll just change the name of that cte :)

nene commented 3 months ago

I'll close this issue as it's too unlikely that it'll get fixed.