Simplify type autoloading with pgxpool

nicois commented 5 months ago

Provide some backwards-compatible configuration options for pgxpool which streamlines the use of the bulk loading of custom types:

AutoLoadTypes: a list of type (or class) names to automatically load for each connection, automatically also loading any other types these depend on.
ReuseTypeMaps: if enabled, pgxpool will cache the typemap information, avoiding the need to perform any further queries as new connections are created.

ReuseTypeMaps is disabled by default as in some situations, a connection string might resolve to a pool of servers which do not share the same type name -> OID mapping.

nicois commented 5 months ago

A recommended way of using this would be something like:

        config.AutoLoadTypes = []string{"my_type", "another_type"}
        config.ReuseTypeMaps = true
        config.MinConns = 10

Which, compared to previously, would:

make a single SQL call as soon as the pgxpool was created (as MinConns > 0, the pool will immediately begin filling)
the type map information will be loaded, without needing to worry about explicitly naming other custom types; they will be automatically detected
subsequent connections will not need to perform any SQL queries to load custom types while being initialised

Together this should result in a significant improvement both to startup times and whenever new connections are being added to the connection pool, whenever custom types are being registered.

jackc commented 5 months ago

I think I'd like to wait and see what happens with #2046 before looking too much into this. But there are a few things that come to mind.

How does AutoLoadTypes work with non-derived types. e.g. I have to manually register hstore. How do I register _hstore? A []string of types to load and register is really convenient, but I'm not sure it would work in every case.
That might affect how the types are shared in pgxpool.
If these can be resolved perhaps some of this logic could / should be pushed down into something usable from

nicois commented 5 months ago

Agreed, let's get the other PR sorted before finessing this PR.

Related array types should be handled by the new loader. I haven't experimented with this in depth, as I haven't run into any problems with the code, but if there is an example/test where array types aren't supported, I can remedy that, probably in the other PR.

jackc commented 5 months ago

Related array types should be handled by the new loader.

Right, I expect it would work - once the underlying type is loaded. But I wasn't sure if the AutoLoadTypes would be handled first, leaving it unusable in those cases.

nicois commented 5 months ago

Related array types should be handled by the new loader.

Right, I expect it would work - once the underlying type is loaded. But I wasn't sure if the AutoLoadTypes would be handled first, leaving it unusable in those cases.

If it works correctly, this line should collect any dependent array types. I can look at adding a test in the other PR to explicitly ensure this works.

jackc commented 5 months ago

What I mean is, what happens when it tries to load _hstore when hstore is not registered yet?

nicois commented 5 months ago

I see what you mean. If hstore can only be manually registered, we need to provide the means to do so before the autoloader. The autoloader is set to execute after the afterConnect hook has finished. This can be used to perform any manual registration which is required beforehand. Of course, whatever happens in afterConnect will not be cached; if the user want to avoid unnecessary database calls, they would have to implement something equivalent to reuseTypeMap.

nicois commented 5 months ago

Thinking a bit more about manual registration of certain types, and how to reduce the number of queries there: what do you think of adding something like a map[string] func(c conn.Conn, oid uint32) error field to the pgxpool config. Then if they want to register say hstore on each connection, they can assign their custom registration function to this map against the "hstore" key, and the pgxpool will retrieve the OIDs for each of these types, then call the functions with that information? Not only would this be cleaner, but if the reuse... configuration boolean was set, pgxpool will be able to internally cache the type name -> OID mapping, and also only need to make a database call when the first connection is created. Perhaps the pgxpool config struct could be given a method, something like

func (c *Config) WithManualTypeRegistration(typeName string, f func(c conn.Conn, oid uint32) error) *Config {
   ...
}

(or perhaps passing in the typemap object instead of the connection, if you prefer) If this makes some sense to you, I can create a separate PR to discuss it further, once this PR is merged.

nicois commented 5 months ago

I have rebased this on top of the other PR, and added a comment relating to the autoload config settings to clarify that custom type registration done in AfterConnect will be respected. Discussion about further optimising the loading of the custom registration can be talked about if/when this PR is merged.

nicois commented 4 months ago

Thinking about https://pkg.go.dev/sync#OnceValue : I can see how this can replace the use of a mutex. Were you thinking that the pgxpool's struct would contain a function used to register the types? And that if the reuse flag was set, that function would be replaced by the OnceValue version of it?

The primary question is whether the user provides the autoload information via the pgxpool's config, or via some other mechanism.

jackc commented 4 months ago

Were you thinking that the pgxpool's struct would contain a function used to register the types?

Yes, something like this. Though since pgxpool already has a AfterConnect hook it might just be using AfterConnect in a specific way.

And that if the reuse flag was set, that function would be replaced by the OnceValue version of it?

The reuse flag wouldn't have to exist. The function would use OnceValue internally. From pgxpool's point of view, it has a function it calls to get (or get and register) the type information. This technique could be documented, or maybe even a new standalone function that encapsulates this concept and returns a type loading function that internally uses OnceValue.

Potentially, this lets the user control autoload and type reuse without pgxpool needing any explicit support.

nicois commented 4 months ago

Without reuse being optional, how would heterogeneous servers be supported, where oids could differ?

Whether it's via a config setting or environment variable, if this is a possible situation, we should support it, shouldn't we?

If not, I certainly agree that it would be nice to eliminate the need to support such a workflow. This was based on what you suggested earlier.

jackc commented 4 months ago

Reuse would still be optional. It would be up to the caller to implement.

Here is what I'm thinking:

    makeLoadTypesOnce := func() func(ctx context.Context, conn *pgx.Conn) error {
        var mux sync.Mutex
        var loaded bool
        var types []*pgtype.Type

        return func(ctx context.Context, conn *pgx.Conn) error {
            mux.Lock()
            defer mux.Unlock()

            if loaded {
                return types, nil
            }

            var err error
            types, err = conn.LoadTypes(ctx, "mytype", "myothertype")
            if err != nil {
                return err
            }

            loaded = true
            return nil
        }
    }

    dbconfig.AfterConnect = makeLoadTypesOnce()

I haven't actually tested it, but I think it should work. I wasn't able to figure out a way to get sync.OnceValue to work as it needs the ctx and conn from the first call. But the sync.Mutex works fine.

Also, prototyping this code made me even more in favor of this logic being activated through the existing AfterConnect hook. The project that I was working in had this AfterConnect hook already installed:

    dbconfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
        pgxuuid.Register(conn.TypeMap())
        return nil
    }

It's using github.com/jackc/pgx-gofrs-uuid to integrate with github.com/gofrs/uuid. It is important that this type be registered before the any composites that have a UUID field or the wrong UUID type could be used.

It could be modified as follows to load types once after registering the custom UUID type:

    loadTypesOnce := makeLoadTypesOnce()
    dbconfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
        pgxuuid.Register(conn.TypeMap())
        err := loadTypesOnce(ctx, conn)
        if err != nil {
            return err
        }
        return nil
    }

Not sure if we would want a function that creates / simplifies the load once logic or not. It would be more convenient. But I can imagine different applications having slightly different preferences so it might be easier to document the pattern instead.

nicois commented 4 months ago

We can make this something where each user needs to write/adapt their AfterConnect to do this. I just thought that type registration was such a common pattern that we could ensure it would done in a consistent way through providing helpers to manage the autoloading, and to keep AfterConnect free for anything special they want done apart from type registration. The other benefit to reuse being a configuration option is that it is more likely to be exposed to end-users via a DSN, so if when all they have is a precompiled binary and the author didn't consider supporting the reuse of type registration, those end-users can still opt in.

nicois commented 4 months ago

@jackc I have made a helper which can be used to build an AfterCommit hook. I have looked at the uuid code yet, but wanted to share what I've got now, as it's already fairly compact.

I think this strikes a reasonable balance, where this helper can keep things sufficiently DRY while still allowing people to decide whether to chain this with other AfterConnect logic.

I'm also interested in what you think of the environment variable. I do think that since we aren't using the pgxpool config for this any more, it wouldn't be possible for a DSN to control the reuse of type information, so an optional environment variable seems like a good way to let this setting be controlled by end users.

nicois commented 4 months ago

If the approach shown here is one you are happy with, it might actually make sense for me to integrate this with https://github.com/jackc/pgx/pull/2049 , as the helper function's signature will change when support for custom registration functions is added.

nicois commented 4 months ago

The reason for wanting the environment variable is because I expect there are a significant number of projects which use pgx. Many of them will not bother exposing the reuse setting, either disabling it for safety, or enabling it as 99% of users will benefit from the extra speed, particularly with large numbers of connections. Because the DSN doesn't provide the ability to control this boolean (and as it's not part of ConnConfig, but only defined in a bundled function), an environment variable would avoid the situation where people are trying to use an pgx-based application and are otherwise unable to access this setting. This is the time when we have the opportunity to define a consistent method to control this flag. But I can remove it if you don't like it; this is your project.

jackc commented 4 months ago

I'd prefer to hold off on the environment variable. We can always add it later, but once it's there we can't remove it.

nicois commented 4 months ago

OK, I've removed the environment variable reference.

jackc commented 4 months ago

@nicois

There is still the race condition mentioned above.

But there is one other thing that needs to be figured out that was brought to my attention by #2089. EnumCodec has mutable state. The Type and Codec interfaces require that they are not modified after registration, but I suppose it is unclear whether they should be able to modify themselves. EnumCodec modifies itself, so it would not be safe to share between multiple connections. Not sure what the best solution is here.

nicois commented 4 months ago

I'll take a look at that race condition. I didn't notice your comment, sorry.

Relating to state mutation: if types and codecs shouldn't be mutated, should they have a Copy() member which produces an equivalent object with a separate internal state? If so, we could then iterate over these items using this method to generate copies for the new connection. Potentially Copy() could return a pointer to itself, if the type/codec is one which is safe to reuse.

jackc commented 4 months ago

Relating to state mutation: if types and codecs shouldn't be mutated, should they have a Copy() member which produces an equivalent object with a separate internal state? If so, we could then iterate over these items using this method to generate copies for the new connection.

Seems reasonable.

Potentially Copy() could return a pointer to itself, if the type/codec is one which is safe to reuse.

I would make it an optional additional interface a Codec could also implement. So it would be assumed safe to use concurrently if it was not implemented.

jackc / pgx

Simplify type autoloading with pgxpool #2048