pganalyze / libpg_query

C library for accessing the PostgreSQL parser outside of the server environment
BSD 3-Clause "New" or "Revised" License
1.21k stars 182 forks source link

Parsing queries from plpgsql #215

Closed lcheruka closed 11 months ago

lcheruka commented 1 year ago

Trying to parse queries found in PLpgSQL_expr after calling pg_query_parse_plpgsql. In version 14 and later postgres changed how it returns the query in the struct. Using the examples below, before it would return SELECT EXISTS(SELECT 1 FROM sample WHERE name = n) but now it returns the original query EXISTS(SELECT 1 FROM sample WHERE name = n) with a parseMode field. Any issues/concerns with exposing parseMode and being able to pass it to parse functions?

CREATE OR REPLACE FUNCTION public.function_a(n TEXT) 
RETURNS boolean LANGUAGE plpgsql 
AS 
$$ 
BEGIN 
    IF EXISTS(SELECT 1 FROM sample WHERE name = n) THEN
        RETURN true;
    END IF;
    RETURN false;
END;
$$;

libpg_query on 13-latest branch

[
    {
        "PLpgSQL_function": {
            "datums": [
                {
                    "PLpgSQL_var": {
                        "refname": "n",
                        "datatype": {
                            "PLpgSQL_type": {
                                "typname": "UNKNOWN"
                            }
                        }
                    }
                },
                {
                    "PLpgSQL_var": {
                        "refname": "found",
                        "datatype": {
                            "PLpgSQL_type": {
                                "typname": "UNKNOWN"
                            }
                        }
                    }
                }
            ],
            "action": {
                "PLpgSQL_stmt_block": {
                    "lineno": 1,
                    "body": [
                        {
                            "PLpgSQL_stmt_if": {
                                "lineno": 1,
                                "cond": {
                                    "PLpgSQL_expr": {
                                        "query": "SELECT EXISTS(SELECT 1 FROM sample WHERE name = n)"
                                    }
                                },
                                "then_body": [
                                    {
                                        "PLpgSQL_stmt_return": {
                                            "lineno": 1,
                                            "expr": {
                                                "PLpgSQL_expr": {
                                                    "query": "SELECT true"
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "PLpgSQL_stmt_return": {
                                "lineno": 1,
                                "expr": {
                                    "PLpgSQL_expr": {
                                        "query": "SELECT false"
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
]

libpg_query on 15-latest branch

[
    {
        "PLpgSQL_function": {
            "datums": [
                {
                    "PLpgSQL_var": {
                        "refname": "n",
                        "datatype": {
                            "PLpgSQL_type": {
                                "typname": "UNKNOWN"
                            }
                        }
                    }
                },
                {
                    "PLpgSQL_var": {
                        "refname": "found",
                        "datatype": {
                            "PLpgSQL_type": {
                                "typname": "UNKNOWN"
                            }
                        }
                    }
                }
            ],
            "action": {
                "PLpgSQL_stmt_block": {
                    "lineno": 1,
                    "body": [
                        {
                            "PLpgSQL_stmt_if": {
                                "lineno": 1,
                                "cond": {
                                    "PLpgSQL_expr": {
                                        "query": "EXISTS(SELECT 1 FROM sample WHERE name = n)"
                                    }
                                },
                                "then_body": [
                                    {
                                        "PLpgSQL_stmt_return": {
                                            "lineno": 1,
                                            "expr": {
                                                "PLpgSQL_expr": {
                                                    "query": "true"
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "PLpgSQL_stmt_return": {
                                "lineno": 1,
                                "expr": {
                                    "PLpgSQL_expr": {
                                        "query": "false"
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
]
lfittl commented 1 year ago

Trying to parse queries found in PLpgSQL_expr after calling pg_query_parse_plpgsql. In version 14 and later postgres changed how it returns the query in the struct. Using the examples below, before it would return SELECT EXISTS(SELECT 1 FROM sample WHERE name = n) but now it returns the original query EXISTS(SELECT 1 FROM sample WHERE name = n) with a parseMode field. Any issues/concerns with exposing parseMode and being able to pass it to parse functions?

I think that should work - for reference, that's the change in https://github.com/postgres/postgres/commit/c9d5298485b78a37923a23f9af9aa0ade06762db that caused this to work differently.

Rough outline of what we'd need to do:

  1. Modify this here to also output the parseMode: https://github.com/pganalyze/libpg_query/blob/009f19d7f867a699e1beab5fd545fbc07ad35647/src/pg_query_json_plpgsql.c#L628
  2. Modify pg_query_raw_parse to accept the parseMode argument, and call raw_parser accordingly: https://github.com/pganalyze/libpg_query/blob/009f19d7f867a699e1beab5fd545fbc07ad35647/src/pg_query_parse.c#L12
  3. Introduce a new top-level method that allows passing this (I'm thinking we make this a separate method to avoid breaking existing callers), e.g. add a pg_query_parse_protobuf_opts function that takes the parse mode as the second argument: https://github.com/pganalyze/libpg_query/blob/009f19d7f867a699e1beab5fd545fbc07ad35647/src/pg_query_parse.c#L110
  4. Add tests :)

I might have missed something, but that's where I'd start. Do you want to try making a PR for this?

lcheruka commented 1 year ago

I can work on getting a pr for this. Just wanted to check if this is something you wanted to support.

seanlinsley commented 1 year ago

Since the Postgres logs for log_min_duration_statement and auto_explain give us the query text with the first SELECT missing, I wonder if libpg_query should automatically try both parse modes before returning an error. It's potentially asking too much from library users to know ahead of time if the query is sourced from a function.

lcheruka commented 1 year ago

I wonder if libpg_query should automatically try both parse modes before returning an error.

There are 5 different parse modes. Do you want the library to try all 5? If there was an actual syntax error then how would the library know which error to return to user?

seanlinsley commented 1 year ago

For library users who want to parse queries, the only modes that matter are PG_QUERY_PARSE_DEFAULT and PG_QUERY_PARSE_PLPGSQL_EXPR. It would be nice if we could offer a combined API so they don't need to try both.

This could possibly be handled by the Ruby/Rust/Go libraries instead of the C library.

Thoughts @lfittl?

lcheruka commented 1 year ago

How would you handle if PLpgSQL_expr return this query?

v := (SELECT value FROM sample WHERE name = n LIMIT 1);

Would that be on the user to find the query after the assignment? That seems more annoying than using the parseMode that is returned with the query on PLpgSQL_expr.

seanlinsley commented 1 year ago

Ah okay, I stand corrected. That generates:

"PLpgSQL_expr": {
  "parseMode": 3,
  "query": "v := (SELECT value FROM sample WHERE name = n LIMIT 1)"
}

This is unfortunate. Both from an API ergonomics and a runtime performance perspective, it's not ideal that users will have to try parsing a query up to 5 times in order to generate a parse tree or fingerprint.

There is at least a hint in the auto_explain CONTEXT line, but since it's not part of the JSON it may not be good to rely on.


CONTEXT:  PL/pgSQL assignment "v := (SELECT 1 FROM (SELECT pg_sleep(1)) _)"
    PL/pgSQL function foo() line 5 at assignment

CONTEXT:  SQL expression "EXISTS(SELECT 1 FROM (SELECT pg_sleep(1)) _)"
    PL/pgSQL function foo() line 6 at IF