pganalyze / libpg_query

C library for accessing the PostgreSQL parser outside of the server environment
BSD 3-Clause "New" or "Revised" License
1.21k stars 182 forks source link

Add support for reading in JSON-based nodes format (to run pg_query_deparse) #178

Open jgoux opened 1 year ago

jgoux commented 1 year ago

Hello 👋,

Would it be possible to expose a pg_query_deparse which would directly work on the result of pg_query_parse (so skipping the protobuf layer)?

My goal would be to have the deparsing capability of libpg_query exposed to the JavaScript world through libpg-query-node.

I'm not a C programmer, so I'm not sure about the possibility/difficulty of the task. 😅

lfittl commented 1 year ago

@jgoux Can you share more about the use case?

Generally the idea is that you'd parse, modify the parse tree, and then deparse - and protobuf is the way the parsetree gets passed back and forth.

jgoux commented 1 year ago

@lfittl libpg-query-node seems to rely on pg_query_parse.

Reading the tests of libpg_query it doesn't seem possible to pass the result of pg_query_parse to pg_query_deparse_protobuf right?

So I'm wondering why is there a pg_query_parse + pg_query_parse_protobuf but not a pg_query_deparse + pg_query_deparse_protobuf?

Does it mean that libpg-query-node should rely on pg_query_parse_protobuf instead?

My use case is really just to be able to pass what I obtain from query.parseQuery(sql)/parseQuerySync from libpg-query-node (after modifying the parsetree) to a deparser function. 😄

jgoux commented 1 year ago

I guess what I want is the equivalent of pg_query_parse for pg_query_deparse, working with json instead of protobuf to pass the parsetree: https://github.com/pganalyze/libpg_query/blob/1ada550d901ed4edbfb6bce2163d21f4b948ab2d/src/pg_query_parse.c#L101

lfittl commented 1 year ago

I guess what I want is the equivalent of pg_query_parse for pg_query_deparse, working with json instead of protobuf to pass the parsetree:

https://github.com/pganalyze/libpg_query/blob/1ada550d901ed4edbfb6bce2163d21f4b948ab2d/src/pg_query_parse.c#L101

Ah, thanks for clarifying!

There is currently no support for reading in the JSON format (to turn it into C structs again, which are used by the deparser) - that was part of the motivation for the Protobuf work, since its a bit more straightforward to read than JSON.

Its not impossible to add this, but its a lot more work than just exposing a function, as what would be needed is essentially pg_query_json_to_nodes, similar to the pg_query_protobuf_to_nodes function: https://github.com/pganalyze/libpg_query/blob/15-latest/src/pg_query_readfuncs_protobuf.c#L154

Note that matters are complicated a bit by the fact that we currently don't have a JSON parser bundled in the C library.

This may be something we can add in a future version, as Postgres 16 had major rework of how the out/read/copy funcs code works, and it may get easier to add extra formats with that work in place.

But for now this is not a priority on our end, though a PR would of course be welcome.

jgoux commented 1 year ago

Just so I'm sure I'm understanding how the protobuf work impacts the consumption of libpg_query.

Is it an implementation detail?

For example, if I would like to expose pg_query_parse_protobuf and pg_query_deparse_protobuf to the JavaScript/TypeScript world, would I have to deal with protobuf in my own code, or would it be possible to use it like:

import { parse, deparse } from "libpg-query";

const tree = parse("SELECT 1;");

// tree here would be a normal JS object
console.log(tree.stmts[0].stmt.SelectStmt);

// we pass the normal JS object
const query = deparse(tree);

// output "select 1"
console.log(query)

Or is the JSON-based nodes format necessary to have such an API from the JS/TS world?

lfittl commented 1 year ago

You would have to have a step that turns the Protobuf into Javascript objects (e.g. using a library like protobufjs), and then does the same in reverse before passing things back to libpg_query.

For reference, the protobuf definition can be found here: https://github.com/pganalyze/libpg_query/blob/15-latest/protobuf/pg_query.proto#L17

And you can e.g. look at the Ruby library for how it does that: parse, deparse (note how the Ruby object "tree" gets passed to the auto-generated Protobuf message PgQuery::ParseResult which can decode/encode the Ruby tree into a Protobuf message)

jgoux commented 1 year ago

Excellent, exactly what I was looking for, thanks for the pointer @lfittl! 😄