xataio / pgzx

Create PostgreSQL extensions using Zig.
https://xata.io
Apache License 2.0
446 stars 12 forks source link
extens extensions hacktoberfest pgrx postgres postgresql zig ziglang
pgzx logo

License - Apache 2.0  CI Build   Discord   X (formerly Twitter) Follow

pgzx - Create Postgres Extensions with Zig!

pgzx is a library for developing PostgreSQL extensions written in Zig. It provides a set of utilities (e.g. error handling, memory allocators, wrappers) as well as a development environment that simplifies integrating with the Postgres code base.

Why Zig?

Zig is a small and simple language that aims to be a "modern C" and make system-level code bases easier to maintain. It provides safe memory management, compilation time code execution (comptime), and a rich standard library.

Zig can interact with C code quite naturally: it supports the C ABI, can work with C pointers and types directly, it can import header files and even translate C code to Zig code. Thanks to this interoperability, a Postgres extension written in Zig can, theoretically, accomplish anything that a C extension can. This means you get full power AND a modern language and standard library to write your extension.

While in theory you can write any extension in Zig that you could in C, in practice you will need to make sense of a lot of Postgres internals in order to know how to correctly use them from Zig. Also, Postgres makes extensive use of macros, and not all of them can be translated automatically. This is where pgzx comes in: it provides a set of Zig modules that make the development of Postgres Extensions in Zig much simpler.

Examples

The following sample extensions (ordered from simple to complex) show how to use pgzx:

Extension Description
char_count_zig Adds a function that counts how many times a particular character shows up in a string. Shows how to register a function and how to interpret the parameters.
pghostname_zig Adds a function that returns the database server's host name.
pg_audit_zig Inspired by the pgaudit C extension, this one registers callbacks to multiple hooks and uses more advanced error handling and memory allocation patterns.

Docs

The reference documentation is available at here.

We recommend checking the examples in the section above to understand how to use pgzx. The next sections contain a high-level walkthrough of the most important utilities and how they relate to the Postgres internals.

Getting Started

This project uses Nix flakes to manage build dependencies and provide a development shell. We provide a template for you to initialize a new Zig based Postgres extension project which allows you to reuse some of the utilities we're using.

Before getting started we would recommend you familiarize yourself with the projects setup first. To do so, please start with the Contributing section.

We will create a new project folder for our new extension and initialize the folder using the projects template:

$ mkdir my_extension
$ cd my_extension
$ nix flake init -t github:xataio/pgzx

This step will create a working extension named 'my_extension'. The extension exports a hello world function named hello().

The templates README.md file already contains instructions on how to enter the development shell, build, and test the extension. You can follow the instructions and verify that your setup is functioning. Do not forget to use pgstop before quitting the development shell.

The development shell declares a few environment variables used by the project (see devshell.nix):

Next we want to rename the project to match our extension name. To do so, update the file names in the extension folder, and replace my_extension with your project name in the README.md, build.zig, build.zig.zon, and extensions SQL file.

Logging and error handling

Postgres error reporting functions are used to report errors and log messages. They have typical logging functionality like log levels and formatting, but also Postgres specific functionality, like error reports that can be thrown and caught like exceptions. pgzx provides a wrapper around these functions that makes it easier to use from Zig.

Simple logging can be done with functions like Log, Info, Notice, Warning, for example:

    elog.Info(@src(), "input_text: {s}\n", .{input_text});

Note the @src() built-in which provides the file location. This will be stored in the error report.

To report errors during execution, use the Error or ErrorThrow functions. The latter will throw an error report, which can be caught by the Postgres error handling system (explained below). Example with Error:

    if (target_char.len > 1) {
        return elog.Error(@src(), "Target char is more than one byte", .{});
    }

The elog module also exports functions that resemble the C API including functions like ereport, errcode, or errmsg.

If you browse through the Postgres source code, you'll see the PG_TRY / PG_CATCH / PG_FINALLY macros used as a form of "exception handling" in C, catching errors raised by the ereport family of functions. These macros make use of long jumps (i.e. jumps across function boundaries) to the "catch/finally" destination. This means we need to be careful when calling Postgres functions from Zig. For example, if the called C function raises an ereport error, the long jump might skip the Zig code that would have cleaned up resources (e.g. errdefer).

pgzx offers an alternative Zig implementation for the PG_TRY family of macros. This typically looks in code something like this:

    var errctx = pgzx.err.Context.init();
    defer errctx.deinit();
    if (errctx.pg_try()) {
        // zig code that calls several Postgres C functions.
    } else {
        return errctx.errorValue();
    }

The above code pattern makes sure that we catch any errors raised by Postgres functions and return them as Zig errors. This way, we make sure that all the defer and errdefer code in the caller(s) are executed as expected. For more details, see the documentation for the pgzx.err.Context struct.

The above code pattern is implemented in a wrap convenience function which takes a function and its arguments, and executes it in a block like the above. For example:

    try pgzx.err.wrap(myFunction, .{arg1, arg2});

Memory context allocators

Postgres uses a memory context system to manage memory. Memory allocated in a context can be freed all at once (for example, when a query execution is finished), which simplifies memory management significantly, because you only need to track contexts, not individual allocations. Contexts are also hierarchical, so you can create a context that is a child of another context, and when the parent context is freed, all children are freed as well.

pgzx offers custom wrapper Zig allocators that use Postgres' memory context system. The pgzx.mem.createAllocSetContext function creates an pgzx.mem.MemoryContextAllocator that you can use as a Zig allocator. For example:

    var memctx = try pgzx.mem.createAllocSetContext("zig_context", .{ .parent = pg.CurrentMemoryContext });
    const allocator = memctx.allocator();

In the above, note the use of pg.CurrentMemoryContext as the parent context. This is the context of the current query execution, and it will be freed when the query is finished. This means that the memory allocated with allocator will be freed at the same time.

It's also possible to register a callback for when the memory context is destroyed or reset. This is useful to free or close resources that are tied to the context (e.g. sockets). pgzx provides an utility to register a callback:

    try memctx.registerAllocResetCallback(
        queryDesc.*.estate.*.es_query_cxt,
        pgaudit_zig_MemoryContextCallback,
    );

Function manager

pgzx has utilities for registering functions, written in Zig, that are then available to call over SQL. This is done, for example, via the PG_FUNCTION_V1 function:

comptime {
    pgzx.PG_FUNCTION_V1("my_function", myFunction);
}

The parameters are received from Postgres serialized, but pgzx automatically deserializes them into Zig types.

Testing your extension

pgzx provides two types of automatic tests: pg_regress tests and unit tests. The pg_regress tests work similar with the way they work for C extensions. You provide inputs in a sql folder and expected outputs in the expected folder, and then you can run them like this:

zig build pg_regress

Under the hood, this calls the pg_regress tool from the Postgres build.

For unit tests, we would like to run tests in a Postgres instance, so that the unit tests compile in the same environment as the tested code, and so that the tests can call Postgres APIs. In order to do this, pgzx registers a custom run_tests function via the Function manager. This function can be called from SQL (SELECT run_tests();) and it will run the unit tests.

A test suite is a Zig struct for which each function whose name starts with test is a unit test. To register a test suite, you would typically do something like this:

comptime {
    pgzx.testing.registerTests(@import("build_options").testfn, .{Tests});
}

The build_options.testfn options should be defined via build.zig. For an example on how to do that, check out the char_count_zig or the pgaudit_zig sample extensions.

Note that you can only call the pgzx.testing.registerTests function once per extension. If your extension has multiple modules/files, you should call it like this:

 comptime {
    pgzx.testing.registerTests(@import("build_options").testfn, .{
         @import("module1.zig").Tests,
         @import("module2.zig").Tests,
         @import("module2.zig").Tests,
    });
}

To run the unit tests, provided that you are using our sample build.zig, you can run:

zig build unit -p $PG_HOME

Behind the scenes, this builds the extension with the testfn build option set to true, deploys it in the Postgres instance, and then calls SELECT run_tests(); to run the tests.

Status/Roadmap

pgzx is currently under heavy development by the Xata team. If you want to try Zig for writing PostgreSQL extensions, it is easier with pgzx than without, but expect breaking changes and potential instability. If you need help, join us on the Xata discord.

Contributing

Develpment shell and local installation

We use Nix to provide a local development shell. This ensures that we have a stable environment with all dependencies available in the expected versions. This is especially important with Zig, which is still in active development.

For this purpose it is possible to use this project as input in downstream flake files as well.

The tools we use also require some environment variables set, which are already pre-configured in the develpment shell.

We would recommend the nix-installer from DeterminateSystems. The installer enables Nix Flakes (used by this project) out of the box and also provides an uninstaller.

If you want to try out the project without having to install Nix on your system, you can do so using Docker. You can build the docker image by running the dev/docker/build.sh script. The docker image is named pgzx:latest.

To enter the develpment shell run:

$ nix develop

If you want to use docker instead, run:

$ ./dev/docker/run.sh

NOTE: We also provide a .envrc file to automatically enter the development shell when entering the projects folder. If you use direnv you can enable the environment via direnv allow.

The nix configuration already installs PostgreSQL, but for testing we want to have a local postgres installation where we can install our test extensions in.

We use pglocal to relocate the existing installation into our development environment:

$ pglocal
...

$ ls out
16  default

The out/default folder is a symlink to the postgres installation currently in use.

Having a local installation we want to create a local database and user:

$ pginit
...

This creates a local database named postgres. The script allows us to configure an alternative name for the cluster, database, or user. This allows us to create multiple clusters within our current installation.

We can start and stop the database using pgstart and pgstop. Let's test our current setup:

$ pgstart
$ psql  -U postgres -c 'select version()'
                                         version
-----------------------------------------------------------------------------------------
 PostgreSQL 16.1 on aarch64-apple-darwin22.6.0, compiled by clang version 16.0.6, 64-bit
(1 row)

This project has a few example extensions. We will install and test the char_count_zig extension next:

$ cd examples/char_count_zig
$ zig build -freference-trace -p $PG_HOME
$ psql  -U postgres -c 'CREATE EXTENSION char_count_zig;'
CREATE EXTENSION
$ psql  -U postgres -c "SELECT char_count_zig('aaabc', 'a');"
INFO:  input_text: aaabc

INFO:  target_char: a

INFO:  Target char len: 1

 char_count_zig
----------------
              3
(1 row)

The sample extension also supports pg_regress bases testing:

$ zig build pg_regress --verbose
# using postmaster on Unix socket, port 5432
ok 1         - char_count_test                            10 ms
1..1
# All 1 tests passed.

Debugging the unit tests

Because Postgres manages the actual processes and starts a new process for each client we must attach our debugger to an existing session.

To debug an extension that exposes a function, like the unit tests, first start a SQL session:

$ psql -U postgres

In order to attach our debugger to the session we need the PID for your current process:

postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
          14985
(1 row)

If we want to set breakpoints we must also ensure that our extensions library has been loaded. You might have to drop and re-create an the test function in case you can't set a breakpoint (this will force Postgres to load the library):

postgres=# DROP FUNCTION run_tests;
CREATE FUNCTION run_tests() RETURNS INTEGER AS '$libdir/pgzx_unit' LANGUAGE C IMMUTABLE;

Now we can attach our debugger and set a breakpoint (See your debuggers documentation on how to attach):

$ lldb -p 14985
(lldb) b hsearch.zig:117
...
(lldb) c

You can set breakpoints in your Zig based extension in C sources given you have all debug symbols available.

With our breakpoints set we want to start the testsuite:

postgres=# SELECT run_tests();

Postgres debug build

The default development shell and scripts relocated the Nix postgres installation into the out folder only. When debugging an extension in isolation this is normally all you need. But in case you need to debug and step into the Postgres sources as well it is helpful to have a debug build available.

This project provides a second development shell type that provides tooling to fetch and build Postgres in debug mode.

Select the debug shell to start a shell with required development tools:

nix develop '.#debug'

The pgbuild script will fetch the Postgres sources and build a local debug build that you can use for development, testing, and debugging:

$ pgbuild

This will checkout Postgres into the out/postgresql_src directory. The build files will be stored in out/postgresql_src/build. We use meson and Ninja to build Postgres. Ninja speeds up the compilation by parallelizing compilation tasks as much as possible. Compilation via Ninja also emits a compile_commands.json file that you can use with your editors LSP to improve navigating the Postgres source code if you wish to do so.

Optionally symlink the compile_commands.json file:

$ ln -s ./out/postgresql_src/build/compile_commands.json ./out/postgresql_src/compile_commands.json

The local build will be installed in out/local. To switch to the local Postgres build and ensure that your extension builds against it use:

$ pguse local

Note: Delete the zig-cache folder when switching to another Postgres installation to ensure that you extension is rebuilt properly against the new version.

Debugging Zig standard library and build script support

To debug Zig build scripts or the standard library all you need is the original sources. No additional build step is required. Anyways, it is recommended to use the same library version as the zig compiler ships with. You can query the current version or Git commit of a nightly build using the zig tool:

$ zig version
0.13.0-dev.28+3c5e84073

The version shown here for example indicates that we use a nightly build. The commit ID of that build is 0b744da84.

You can clone and checkout the repository by yourself. We also have a small script ziglocal to checkout and even build the compiler. You can use the script to just checkout the correct version into your development environment:

$ ziglocal clone --commit 0b744da84

This command clones the master branch only into the ./out/zig directory.

Now when building the test extensions you can use the --zig-lib-dir CLI flag to tell the compiler to use an alternative library:

$ zig build unit -p $PG_HOME --zig-lib-dir $PRJ_ROOT/out/zig/lib

The zig compiler will now use the local checkout to build the build.zig file and your project.

Debugging Zig compiler/linker

As Zig is still in development, you might have the need to build the Zig toolchain yourself, maybe in debug mode.

The debug shell installs the additional dependencies that you need to build Postgres or the Zig compiler yourself.

nix develop '.#debug'

Optionally we might want to debug the actual version that we normally use:

$ zig version
0.12.0-dev.3154+0b744da84

Next we checkout and compile the toolchain (Note: the --commit option is optional):

$ ziglocal --commit 0b744da84

This step will take a while. You will find the compiler and library of your local debug build in the out/zig/build/stage3 directory.

Q&A

Which Zig version do you support?

The Zig toolchain, including the compiler, build system, and standard library, is still in development and breaking changes do happen every now and then. For this reason this project follows the Zig master branch.

The Nix based development shell uses zig-overlay in conjunction with the flake.lock file to pin the zig toolchain version to a recent commit ID.

The dependency is updated by us every so often and we try to test and fix breaking changes when updating the toolchain version. We highly recommend to use the projects develoment shell when testing the example extensions provided, otherwise you might have problems compiling the extensions at all.

We understand not everyone is keen to install Nix locally. For getting to know the environment you can build and run a development shell in a local docker container. Use ./dev/docker/build.sh to build the container and ./dev/docker/run.sh to start the dockerized development shell.

The current stable release is verion 0.12. As the build system APIs and package management system are undergoing heavy development recently we chose to stick with the master branch for now.

Where is my extension installed?

By default the zig build system installs all artifacts into the local zig-out folder.

For example we can see that behavior when building the char_count_zig extension:

$ cd examples/char_count_zig
$ zig build
$ find zig-out
zig-out
zig-out/lib
zig-out/lib/char_count_zig.dylib
zig-out/share
zig-out/share/postgresql
zig-out/share/postgresql/extension
zig-out/share/postgresql/extension/char_count_zig.control
zig-out/share/postgresql/extension/char_count_zig--0.1.sql

If you are not sure whether the build system puts all files into the correct location or in case you generate code it can be helpful to debug your build scripts to install into zig-out.

To install the extension with your local Postgres instance you need to pass the path prefix where postgres was installed to using the -p flag:

$ zig build -p <path/to/postgres>

You can run dirname $(pg_config --bindir) from your shell to find the installation prefix. We set PG_HOME to the expected path in the development shell, assuming you use the pglocal or pgbuild scripts to prepare a local postgres installation for development.

See also

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

If you have any questions, encounter issues, or need assistance, open an issue in this repository or join our Discord, and our community will be happy to help.


Made with :heart: by Xata 🦋