zombodb / postgres-parser

Postgres' query parser, as a Rust crate!
PostgreSQL License
90 stars 13 forks source link

parsing plpgsql #43

Open tomuxmon opened 3 years ago

tomuxmon commented 3 years ago

Any way to parse plpgsql?

It parses correctly CreateFunctionStmt, but since function declaration itself is a string literal it is "correctly" parsed as a string. Would be great if it could detect plpgsql language option and use the appropriate parser for that.

eeeebbbbrrrr commented 3 years ago

No support for that right now. Maybe in the future. It's a thing I've thought about, but I haven't investigated how to also pull that bit of code out of the LLVM IR postgres-parser currently generates from the postgres sources.

tomuxmon commented 3 years ago

There are 2 interesting places in pl_comp.c

So I wonder if we can already parse the function definition (a function name, parameters, return type...) would it be not too crazy just trying to parse the function body with plpgsql_compile_inline and stitch it together with the current results?

eeeebbbbrrrr commented 3 years ago

hmm. If it's that simple to just specify one function (_compile_inline), then all the llvm extraction stuff should do the rest for us.

Are there additional headers we'll have to process? I assume there's additional structs to represent the plpgsql parse tree...

tomuxmon commented 3 years ago

I see plpgsql.h seems to have all the structs needed. Most probably PLpgSQL_execstate is not interesting in our case since it should include runtime data. To clarify regarding compile inline: it actually accepts anonymous code block like this:

do
$$
declare
    _a int = 1;
begin
    raise notice 'a = %;', _a;
end;
$$;

So we would need to tweak the actual string literal code block by adding do in front. Also not sure how would it treat the missing variable in case the function had a parameter. But I guess trying out is the only way to find it out :) What should change in build.rs (and other places) to try that out?

eeeebbbbrrrr commented 3 years ago

I was actually poking at this a bit earlier today.

The first thing is that the patch file postgres-parser applies against some of the Postgres Makefiles (in order to compile as llvm ir) purposely excludes the pl/ directory tree.

Additionally, the plpgsql code is not compiled into postgres proper, but as a postgres extension. So we're gonna have to do a lot of additional work in build.sh to be able to extract the plpgsql symbols we need from the plpgsql object file that we don't even generate yet.

Then we'll be able to work on build.rs. There's a number of structs in plpgsql.h that we'll need. Mostly all the ones that include PLpgSQL_stmt_type as their first member along with PLpgSQL_function and whatever types it uses.

This is looking like a lot of work. It's probably good work to do, but it's not an afternoon job.