Factory methods for ADO.NET objects should recycle

roji commented 5 years ago

There are various factory methods in ADO.NET: all methods on DbProviderFactory, as well as several others (DbConnection.CreateDbCommand(), DbCommand.CreateDbParameter(). We should make these recycle instances rather than instantiate new ones, reducing heap allocations, by using pools behind the scene.

Some notes:

Technically, this is a breaking change - disposed .NET objects are supposed to throw ObjectDisposedException forever after disposal. But I think sacrificing that behavior makes a lot of sense here - anyone relying on behavior after disposal is writing very problematic code anyway (would be good to hear opinions).
We need to carefully reset the state of these objects at disposal time.
We should only return instances to the pool which were obtained via the factory methods, otherwise we'd be continuously inserting user-created instances.
Consider using https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.objectpool.defaultobjectpool-1?view=aspnetcore-2.2.

/cc @YohDeadfall @austindrenski @bricelam @divega @ajcvickers

YohDeadfall commented 5 years ago

In which cases should this improve performance and memory usage? Is it mostly for ORMs?

By the way, DbParameter isn't disposable.

roji commented 5 years ago

In which cases should this improve performance and memory usage? Is it mostly for ORMs?

It could improve memory usage for any user by pooling and recycling instead of garbage collecting all the time... Make sense?

By the way, DbParameter isn't disposable.

That's a good point... It's a bit unfortunate, I think there's especially a lot of wasted allocations with parameters... Maybe it's worth considering making it disposable for this, or thinking of another scheme...

YohDeadfall commented 5 years ago

It could improve memory usage for any user by pooling and recycling instead of garbage collecting all the time...

At the cost of code mess? Unfortunately, the API isn't well designed due to historical reasons. There are few problems I see:

Creation of a command parameter doesn't add it to the command.
Creation of a command doesn't associate it with the opened transaction.

Therefore, an user must write much more non-friendly code to improve performance for a bit.

// before
using (var connection = new NpgsqlConnection(connectionString))
using (var command = new NpgsqlCommand("SELECT @p1, @p2", connection))
{
    command.Parameters.AddWithValue("p1", 1);
    command.Parameters.AddWithValue("p2", 2);
}
// after
using (var connection = NpgsqlFactory.Instance.CreateConnection())
using (var command = connection.CreateDbCommand())
using (var p1 = command.CreateDbParameter())
using (var p2 = command.CreateDbParameter())
{
    command.CommandText = "SELECT @p1, @p2";
    command.Parameters.Add(p1);
    command.Parameters.Add(p2);

    p1.ParameterName = nameof(p1);
    p1.Value = 1;

    p2.ParameterName = nameof(p2);
    p2.Value = 1;
}

I don't think that it's the right way to improve performance since most of allocations come from boxing of parameter values and reading values from the result using the indexer this[int] property.

The way to keep code clean and reuse instances is to return objects to the poll in finalizers, but not sure that it's a good idea.

roji commented 5 years ago

At the cost of code mess? Unfortunately, the API isn't well designed due to historical reasons. There are few problems I see:

You're right that the current CreateConnection() and CreateCommand() aren't very usable, but that's a separate problem. First, we can definitely work on adding ADO.NET overloads to these methods which accept a connection string and a command text respectively. Second, we could have Npgsql-specific CreateNpgsqlConnection() and CreateNpgsqlCommand(), which accept the relevant parameters and also returns NpgsqlCommand, so that the user won't need to downcast from DbCommand. So I'd imagine something like the following:

using (var connection = NpgsqlFactory.Instance.CreateNpgsqlConnection(connectionString))
using (var command = connection.CreateNpgsqlCommand("SELECT @p1, @p2"))
{
    command.Parameters.AddWithValue("p1", 1);
    command.Parameters.AddWithValue("p2", 2);
}

Note that one of my current priorities is working on ADO.NET, so it's possible we'll be able to improve the API...

YohDeadfall commented 5 years ago

In the provided case parameters from the AddWithValue method calls won't be recycled if Dispose must be called explicitly. Calling it implicitly when disposing the command would help, but it's unobvious.

Why not use finalizers and measure performance with them first? The behavior will be changed, but only for object which lost references to them. So this may improve perf without user code changes.

YohDeadfall commented 5 years ago

By the way, it would be better to have non-virtual methods to create commands and connections which internally will call the existing methods.

[MethodImpl(MethodImplOptions.AggressiveInlining)
public DbCommand CreateCommand(string commandText)
{
    var command = CreateDbCommand();
    command.CommandText = commandText;

    return command;
}

roji commented 5 years ago

In the provided case parameters from the AddWithValue method calls won't be recycled if Dispose must be called explicitly. Calling it implicitly when disposing the command would help, but it's unobvious.

We should definitely think of the lifecycle for parameter instances, not just in Npgsql but in ADO.NET in general. It's possible to say that parameters live only as long as their commands (so they can be recycled when the command is disposed), but unfortunately that would break backwards compat for any users keeping parameters across commands. I'm not yet sure what can be done.

Why not use finalizers and measure performance with them first? The behavior will be changed, but only for object which lost references to them. So this may improve perf without user code changes.

I'm not 100% sure, but I think that once an object has been finalized you really cannot use it anymore, so recycling (returning to a pool) at finalization wouldn't make sense. This should probably be confirmed/verified though.

By the way, it would be better to have non-virtual methods to create commands and connections which internally will call the existing methods.

I agree, I have a work item for this :) I'm currently concentrating more on the batching API, but that will definitely come too.

YohDeadfall commented 5 years ago

I think that once an object has been finalized you really cannot use it anymore

The same is true for disposed objects, but this doesn't stop you from putting them back to a pool.

I'm currently concentrating more on the batching API, but that will definitely come too.

Could you provide a little bit more info?

roji commented 5 years ago

I think that once an object has been finalized you really cannot use it anymore

The same is true for disposed objects, but this doesn't stop you from putting them back to a pool.

That's not the same. Finalization is a runtime concept (i.e. involving the GC), whereas disposability is an API contract. A disposed object still lives as far as the runtime is concerned, it's simply supposed to not hold any more native resources and not be usable by the user.

I'm currently concentrating more on the batching API, but that will definitely come too.

Could you provide a little bit more info?

Take a look at https://github.com/dotnet/corefx/issues/3688 (but disregard the concrete proposal there). In a few days I expect to post a full proposal.

YohDeadfall commented 5 years ago

That's not the same. Finalization is a runtime concept (i.e. involving the GC), whereas disposability is an API contract.

Yes, I know it. I mean that for short living objects it would be near the same in lifetime. If it's not a solution, then parameters created using a connection via DbConnection.CreateCommand and a provider factory via DbProviderFactory.CreateParameter should have different lifetime policies, or it's a breaking change.

roji commented 5 years ago

That's not the same. Finalization is a runtime concept (i.e. involving the GC), whereas disposability is an API contract.

Yes, I know it. I mean that for short living objects it would be near the same in lifetime. If it's not a solution, then parameters created using a connection via DbConnection.CreateCommand and a provider factory via DbProviderFactory.CreateParameter should have different lifetime policies, or it's a breaking change.

Any change of parameter lifetime here is a breaking change, regardless of whether the parameters came via DbConnection.CreateCommand or DbProviderFactory.CreateParmeter... We already have code other which uses these instances, and they may be assuming they can keep them alive even after the connection has been disposed. However, that doesn't mean we necessarily shouldn't do it...

By the way, this is a completely provider-specific decision - there's no spec/documentation saying anything about lifetimes. Apparently on SQL Server once you've used a parameter instance with an executed command, you can't reuse it any more... Of course we can start making recommendations at the ADO.NET level, but all the providers (and user code) are already out there...

YohDeadfall commented 5 years ago

By the way, this is a completely provider-specific decision - there's no spec/documentation saying anything about lifetimes.

Then let's start disallowing to use the same parameter for multiple commands. It looks more consistent.

Of course we can start making recommendations at the ADO.NET level

It would be better to ask maintainers of other drivers first.

austindrenski commented 5 years ago

Then let's start disallowing to use the same parameter for multiple commands. It looks more consistent.

Should we pre-loop in other micro-/ORMs relying on Npgsql?

roji commented 5 years ago

By the way, this is a completely provider-specific decision - there's no spec/documentation saying anything about lifetimes.

Then let's start disallowing to use the same parameter for multiple commands. It looks more consistent.

The situation is apparently worse - I heard you can't use execute the same command twice with the same parameter instances... If it's true, I don't think that makes any sense at all and I definitely wouldn't want us to align to that :)

Of course we can start making recommendations at the ADO.NET level

It would be better to ask maintainers of other drivers first.

For now, this issue is only about considering this in Npgsql itself. However, as it seems like a pure perf benefit with no downsides whatsoever - beyond breaking users who rely on being able to work with disposed objects - it seems to make general sense (in any case there are no ADO.NET recommendations or guidelines on anything really). But of course we can discuss.

YohDeadfall commented 5 years ago

You can execute a SqlCommand multiple times with the same parameter instances. There is no such restriction you heard about.

roji commented 5 years ago

Maybe I misunderstood and it's about moving SqlParameter from one command to another.

roji commented 4 years ago

Highly probable that perf-wise this doesn't work - multithreading sync related to the pooling typically outweighs the perf benefits from pooling small objects.

bgrainger commented 1 year ago

The efficiency of managing a pool under highly concurrent usage is still a concern, but perhaps this is worth revisiting now that AddNpgsqlDataSource makes it highly likely that all NpgsqlConnection objects are created via a factory method?

roji commented 1 year ago

Yeah, true.

Npgsql already holds a single NpgsqlDataReader instance per (physical) connection, and continuously reuses that; this works especially well since only one active resultset is supported on connections at a given time. This does have the slight disadvantage that if user code mistakenly accesses a reader after disposing it, that may result in undefined behavior (as that reader instance may already be recycled and in use in another thread) rather than an ObjectDisposedException. But it seems right to improve perf here rather than guard against user error.

Npgsql also does something similar with the command returned by NpgsqlConnection.CreateCommand(): if it's disposed and CreateCommand is called again, the same instance is returned; if it isn't, a new command is instantiated ("opportunistic recycling"). We can indeed do the same with NpgsqlConnection instances returned from NpgsqlDataSource - I've opened #4853 with some thoughts to track this.

I think the only thing remaining after that is NpgsqlParameter; while these likely cause quite a bit of GC pressure, they're also not disposable (see discussion above) and so it's not clear how we would do this. We should probably investigate a lighter parameter API for ADO.NET in any case, which would also probably help with GC pressure here.

npgsql / npgsql

Factory methods for ADO.NET objects should recycle #2307