Open nicois opened 5 months ago
A recommended way of using this would be something like:
config.AutoLoadTypes = []string{"my_type", "another_type"}
config.ReuseTypeMaps = true
config.MinConns = 10
Which, compared to previously, would:
MinConns
> 0, the pool will immediately begin filling)Together this should result in a significant improvement both to startup times and whenever new connections are being added to the connection pool, whenever custom types are being registered.
I think I'd like to wait and see what happens with #2046 before looking too much into this. But there are a few things that come to mind.
AutoLoadTypes
work with non-derived types. e.g. I have to manually register hstore
. How do I register _hstore
? A []string
of types to load and register is really convenient, but I'm not sure it would work in every case.Agreed, let's get the other PR sorted before finessing this PR.
Related array types should be handled by the new loader. I haven't experimented with this in depth, as I haven't run into any problems with the code, but if there is an example/test where array types aren't supported, I can remedy that, probably in the other PR.
Related array types should be handled by the new loader.
Right, I expect it would work - once the underlying type is loaded. But I wasn't sure if the AutoLoadTypes
would be handled first, leaving it unusable in those cases.
Related array types should be handled by the new loader.
Right, I expect it would work - once the underlying type is loaded. But I wasn't sure if the
AutoLoadTypes
would be handled first, leaving it unusable in those cases.
If it works correctly, this line should collect any dependent array types. I can look at adding a test in the other PR to explicitly ensure this works.
What I mean is, what happens when it tries to load _hstore
when hstore
is not registered yet?
I see what you mean. If hstore
can only be manually registered, we need to provide the means to do so before the autoloader.
The autoloader is set to execute after the afterConnect
hook has finished. This can be used to perform any manual registration which is required beforehand.
Of course, whatever happens in afterConnect
will not be cached; if the user want to avoid unnecessary database calls, they would have to implement something equivalent to reuseTypeMap
.
Thinking a bit more about manual registration of certain types, and how to reduce the number of queries there: what do you think of adding something like a map[string] func(c conn.Conn, oid uint32) error
field to the pgxpool config. Then if they want to register say hstore
on each connection, they can assign their custom registration function to this map against the "hstore"
key, and the pgxpool will retrieve the OIDs for each of these types, then call the functions with that information?
Not only would this be cleaner, but if the reuse...
configuration boolean was set, pgxpool will be able to internally cache the type name -> OID mapping, and also only need to make a database call when the first connection is created.
Perhaps the pgxpool config struct could be given a method, something like
func (c *Config) WithManualTypeRegistration(typeName string, f func(c conn.Conn, oid uint32) error) *Config {
...
}
(or perhaps passing in the typemap object instead of the connection, if you prefer) If this makes some sense to you, I can create a separate PR to discuss it further, once this PR is merged.
I have rebased this on top of the other PR, and added a comment relating to the autoload config settings to clarify that custom type registration done in AfterConnect
will be respected.
Discussion about further optimising the loading of the custom registration can be talked about if/when this PR is merged.
Thinking about https://pkg.go.dev/sync#OnceValue : I can see how this can replace the use of a mutex. Were you thinking that the pgxpool's struct would contain a function used to register the types? And that if the reuse flag was set, that function would be replaced by the OnceValue
version of it?
The primary question is whether the user provides the autoload information via the pgxpool's config, or via some other mechanism.
Were you thinking that the pgxpool's struct would contain a function used to register the types?
Yes, something like this. Though since pgxpool already has a AfterConnect hook it might just be using AfterConnect in a specific way.
And that if the reuse flag was set, that function would be replaced by the OnceValue version of it?
The reuse flag wouldn't have to exist. The function would use OnceValue internally. From pgxpool's point of view, it has a function it calls to get (or get and register) the type information. This technique could be documented, or maybe even a new standalone function that encapsulates this concept and returns a type loading function that internally uses OnceValue.
Potentially, this lets the user control autoload and type reuse without pgxpool needing any explicit support.
Without reuse being optional, how would heterogeneous servers be supported, where oids could differ?
Whether it's via a config setting or environment variable, if this is a possible situation, we should support it, shouldn't we?
If not, I certainly agree that it would be nice to eliminate the need to support such a workflow. This was based on what you suggested earlier.
Reuse would still be optional. It would be up to the caller to implement.
Here is what I'm thinking:
makeLoadTypesOnce := func() func(ctx context.Context, conn *pgx.Conn) error {
var mux sync.Mutex
var loaded bool
var types []*pgtype.Type
return func(ctx context.Context, conn *pgx.Conn) error {
mux.Lock()
defer mux.Unlock()
if loaded {
return types, nil
}
var err error
types, err = conn.LoadTypes(ctx, "mytype", "myothertype")
if err != nil {
return err
}
loaded = true
return nil
}
}
dbconfig.AfterConnect = makeLoadTypesOnce()
I haven't actually tested it, but I think it should work. I wasn't able to figure out a way to get sync.OnceValue
to work as it needs the ctx
and conn
from the first call. But the sync.Mutex
works fine.
Also, prototyping this code made me even more in favor of this logic being activated through the existing AfterConnect
hook. The project that I was working in had this AfterConnect
hook already installed:
dbconfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
pgxuuid.Register(conn.TypeMap())
return nil
}
It's using github.com/jackc/pgx-gofrs-uuid
to integrate with github.com/gofrs/uuid
. It is important that this type be registered before the any composites that have a UUID field or the wrong UUID type could be used.
It could be modified as follows to load types once after registering the custom UUID type:
loadTypesOnce := makeLoadTypesOnce()
dbconfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
pgxuuid.Register(conn.TypeMap())
err := loadTypesOnce(ctx, conn)
if err != nil {
return err
}
return nil
}
Not sure if we would want a function that creates / simplifies the load once logic or not. It would be more convenient. But I can imagine different applications having slightly different preferences so it might be easier to document the pattern instead.
We can make this something where each user needs to write/adapt their AfterConnect to do this. I just thought that type registration was such a common pattern that we could ensure it would done in a consistent way through providing helpers to manage the autoloading, and to keep AfterConnect free for anything special they want done apart from type registration.
The other benefit to reuse
being a configuration option is that it is more likely to be exposed to end-users via a DSN, so if when all they have is a precompiled binary and the author didn't consider supporting the reuse of type registration, those end-users can still opt in.
@jackc I have made a helper which can be used to build an AfterCommit
hook. I have looked at the uuid code yet, but wanted to share what I've got now, as it's already fairly compact.
I think this strikes a reasonable balance, where this helper can keep things sufficiently DRY while still allowing people to decide whether to chain this with other AfterConnect
logic.
I'm also interested in what you think of the environment variable. I do think that since we aren't using the pgxpool config for this any more, it wouldn't be possible for a DSN to control the reuse of type information, so an optional environment variable seems like a good way to let this setting be controlled by end users.
If the approach shown here is one you are happy with, it might actually make sense for me to integrate this with https://github.com/jackc/pgx/pull/2049 , as the helper function's signature will change when support for custom registration functions is added.
The reason for wanting the environment variable is because I expect there are a significant number of projects which use pgx
. Many of them will not bother exposing the reuse setting, either disabling it for safety, or enabling it as 99% of users will benefit from the extra speed, particularly with large numbers of connections.
Because the DSN doesn't provide the ability to control this boolean (and as it's not part of ConnConfig, but only defined in a bundled function), an environment variable would avoid the situation where people are trying to use an pgx-based application and are otherwise unable to access this setting.
This is the time when we have the opportunity to define a consistent method to control this flag. But I can remove it if you don't like it; this is your project.
I'd prefer to hold off on the environment variable. We can always add it later, but once it's there we can't remove it.
OK, I've removed the environment variable reference.
@nicois
There is still the race condition mentioned above.
But there is one other thing that needs to be figured out that was brought to my attention by #2089. EnumCodec
has mutable state. The Type
and Codec
interfaces require that they are not modified after registration, but I suppose it is unclear whether they should be able to modify themselves. EnumCodec
modifies itself, so it would not be safe to share between multiple connections. Not sure what the best solution is here.
I'll take a look at that race condition. I didn't notice your comment, sorry.
Relating to state mutation: if types and codecs shouldn't be mutated, should they have a Copy()
member which produces an equivalent object with a separate internal state? If so, we could then iterate over these items using this method to generate copies for the new connection.
Potentially Copy()
could return a pointer to itself, if the type/codec is one which is safe to reuse.
Relating to state mutation: if types and codecs shouldn't be mutated, should they have a Copy() member which produces an equivalent object with a separate internal state? If so, we could then iterate over these items using this method to generate copies for the new connection.
Seems reasonable.
Potentially Copy() could return a pointer to itself, if the type/codec is one which is safe to reuse.
I would make it an optional additional interface a Codec
could also implement. So it would be assumed safe to use concurrently if it was not implemented.
Provide some backwards-compatible configuration options for pgxpool which streamlines the use of the bulk loading of custom types:
ReuseTypeMaps is disabled by default as in some situations, a connection string might resolve to a pool of servers which do not share the same type name -> OID mapping.