pganalyze / libpg_query

C library for accessing the PostgreSQL parser outside of the server environment
BSD 3-Clause "New" or "Revised" License
1.17k stars 173 forks source link

Haskell bindings + Nix build #117

Open dmjio opened 3 years ago

dmjio commented 3 years ago

Hello 👋🏼 thanks for making this library, its great work.

In the course of making Haskell bindings (pg-query), I've also created a nix build.

{ pkgs ? import <nixpkgs> {} }:
with pkgs.haskell.lib;
let
  pg_query = pkgs.stdenv.mkDerivation {
    name = "pg_query";
    src = pkgs.fetchFromGitHub {
      owner = "pganalyze";
      repo = "libpg_query";
      rev = "a2482061d8ad8731c485b3cb7e8c9109b7b3529b";
      sha256 = "1zdq3lq0ym329gjak8xj2a9i0psp2qvr7imlm1cy4in3rwx9vdf6";
    };
    buildInputs = with pkgs; [ clang which protobufc ] ;
    patchPhase = with pkgs; ''
      ${gnused}/bin/sed -i.bak 's/prefix = .*/prefix = $(out)/g' Makefile
    '' + pkgs.lib.optionalString pkgs.stdenv.isDarwin ''
      ${gnused}/bin/sed -i.bak 's/-Wl,-soname,$(SONAME)/-Wl/g' Makefile
    '';
  };
in
  pg_query

If you'd like I can submit a PR of this derivation to nixpkgs, this repo, or both (or neither).

Also, is there a formal specification of the JSON structure of the SQL AST? I'd like to begin making a Haskell AST now that its accessible via C, but unsure if there's a spec somewhere (i.e. like a JSON schema file, etc.)

lfittl commented 3 years ago

Hello 👋🏼 thanks for making this library, its great work.

Thanks! Happy to see you created Haskell bindings!

... If you'd like I can submit a PR of this derivation to nixpkgs, this repo, or both (or neither).

Hmm, good question - since the library is usually not used as a package (but rather embedded in the various language-specific wrappers), we don't have a place in this repo for packaging information. If there is no downside to it, I think for now I'd prefer not to have packaging-related files in this repo (but open to it, if it makes a difference for maintainability).

Also, is there a formal specification of the JSON structure of the SQL AST? I'd like to begin making a Haskell AST now that its accessible via C, but unsure if there's a spec somewhere (i.e. like a JSON schema file, etc.)

There isn't really - the source of truth is the Postgres source code, and the structure gets generated from that. One could potentially envision creating a JSON schema file for the AST if you auto-generated it, similar to how the Protobuf definitions get auto-generated (see https://github.com/pganalyze/libpg_query/blob/13-latest/scripts/generate_protobuf_and_funcs.rb - not the easiest to read file, but that does all the work based on the files located in the srcdata directory).

dmjio commented 3 years ago

Hmm, my approach was to copy the AST from the PostgreSQL gram.y file, and attempt to match up the JSON with the AST, sound reasonable?

lfittl commented 3 years ago

Hmm, my approach was to copy the AST from the PostgreSQL gram.y file, and attempt to match up the JSON with the AST, sound reasonable?

Not sure how you mean - the gram.y file / the raw_parser method, ends up creating C structs (thats how it works in Postgres itself), and what pg_query does is it outputs them as JSON (see https://github.com/pganalyze/libpg_query/blob/13-latest/src/pg_query_outfuncs_json.c).

This JSON output logic is based on the automatic analysis of the Postgres source code, with the information saved to the srcdata/ directory. e.g. here you can find the description of a SelectStmt AST node: https://github.com/pganalyze/libpg_query/blob/13-latest/srcdata/struct_defs.json#L2280 (describing this C struct in the original source: https://github.com/postgres/postgres/blob/master/src/include/nodes/parsenodes.h#L1639)

If you were to translate them back to a language-specific structure, my recommendation would be to either:

1) Auto-generate the language-specific structs based on the srcdata/ information (or your own reading of the Postgres source code), and then parse the JSON into these structs

2) Use a Protobuf library to parse the Protobuf output (instead of the JSON output), which automatically gives you language-specific structs (this is e.g. how the Go library works)

Hope that helps!