Open roji opened 5 years ago
In which cases should this improve performance and memory usage? Is it mostly for ORMs?
By the way, DbParameter
isn't disposable.
In which cases should this improve performance and memory usage? Is it mostly for ORMs?
It could improve memory usage for any user by pooling and recycling instead of garbage collecting all the time... Make sense?
By the way, DbParameter isn't disposable.
That's a good point... It's a bit unfortunate, I think there's especially a lot of wasted allocations with parameters... Maybe it's worth considering making it disposable for this, or thinking of another scheme...
It could improve memory usage for any user by pooling and recycling instead of garbage collecting all the time...
At the cost of code mess? Unfortunately, the API isn't well designed due to historical reasons. There are few problems I see:
Creation of a command parameter doesn't add it to the command.
Creation of a command doesn't associate it with the opened transaction.
Therefore, an user must write much more non-friendly code to improve performance for a bit.
// before
using (var connection = new NpgsqlConnection(connectionString))
using (var command = new NpgsqlCommand("SELECT @p1, @p2", connection))
{
command.Parameters.AddWithValue("p1", 1);
command.Parameters.AddWithValue("p2", 2);
}
// after
using (var connection = NpgsqlFactory.Instance.CreateConnection())
using (var command = connection.CreateDbCommand())
using (var p1 = command.CreateDbParameter())
using (var p2 = command.CreateDbParameter())
{
command.CommandText = "SELECT @p1, @p2";
command.Parameters.Add(p1);
command.Parameters.Add(p2);
p1.ParameterName = nameof(p1);
p1.Value = 1;
p2.ParameterName = nameof(p2);
p2.Value = 1;
}
I don't think that it's the right way to improve performance since most of allocations come from boxing of parameter values and reading values from the result using the indexer this[int]
property.
The way to keep code clean and reuse instances is to return objects to the poll in finalizers, but not sure that it's a good idea.
At the cost of code mess? Unfortunately, the API isn't well designed due to historical reasons. There are few problems I see:
You're right that the current CreateConnection()
and CreateCommand()
aren't very usable, but that's a separate problem. First, we can definitely work on adding ADO.NET overloads to these methods which accept a connection string and a command text respectively. Second, we could have Npgsql-specific CreateNpgsqlConnection()
and CreateNpgsqlCommand()
, which accept the relevant parameters and also returns NpgsqlCommand, so that the user won't need to downcast from DbCommand. So I'd imagine something like the following:
using (var connection = NpgsqlFactory.Instance.CreateNpgsqlConnection(connectionString))
using (var command = connection.CreateNpgsqlCommand("SELECT @p1, @p2"))
{
command.Parameters.AddWithValue("p1", 1);
command.Parameters.AddWithValue("p2", 2);
}
Note that one of my current priorities is working on ADO.NET, so it's possible we'll be able to improve the API...
In the provided case parameters from the AddWithValue
method calls won't be recycled if Dispose
must be called explicitly. Calling it implicitly when disposing the command would help, but it's unobvious.
Why not use finalizers and measure performance with them first? The behavior will be changed, but only for object which lost references to them. So this may improve perf without user code changes.
By the way, it would be better to have non-virtual methods to create commands and connections which internally will call the existing methods.
[MethodImpl(MethodImplOptions.AggressiveInlining)
public DbCommand CreateCommand(string commandText)
{
var command = CreateDbCommand();
command.CommandText = commandText;
return command;
}
In the provided case parameters from the AddWithValue method calls won't be recycled if Dispose must be called explicitly. Calling it implicitly when disposing the command would help, but it's unobvious.
We should definitely think of the lifecycle for parameter instances, not just in Npgsql but in ADO.NET in general. It's possible to say that parameters live only as long as their commands (so they can be recycled when the command is disposed), but unfortunately that would break backwards compat for any users keeping parameters across commands. I'm not yet sure what can be done.
Why not use finalizers and measure performance with them first? The behavior will be changed, but only for object which lost references to them. So this may improve perf without user code changes.
I'm not 100% sure, but I think that once an object has been finalized you really cannot use it anymore, so recycling (returning to a pool) at finalization wouldn't make sense. This should probably be confirmed/verified though.
By the way, it would be better to have non-virtual methods to create commands and connections which internally will call the existing methods.
I agree, I have a work item for this :) I'm currently concentrating more on the batching API, but that will definitely come too.
I think that once an object has been finalized you really cannot use it anymore
The same is true for disposed objects, but this doesn't stop you from putting them back to a pool.
I'm currently concentrating more on the batching API, but that will definitely come too.
Could you provide a little bit more info?
I think that once an object has been finalized you really cannot use it anymore
The same is true for disposed objects, but this doesn't stop you from putting them back to a pool.
That's not the same. Finalization is a runtime concept (i.e. involving the GC), whereas disposability is an API contract. A disposed object still lives as far as the runtime is concerned, it's simply supposed to not hold any more native resources and not be usable by the user.
I'm currently concentrating more on the batching API, but that will definitely come too.
Could you provide a little bit more info?
Take a look at https://github.com/dotnet/corefx/issues/3688 (but disregard the concrete proposal there). In a few days I expect to post a full proposal.
That's not the same. Finalization is a runtime concept (i.e. involving the GC), whereas disposability is an API contract.
Yes, I know it. I mean that for short living objects it would be near the same in lifetime. If it's not a solution, then parameters created using a connection via DbConnection.CreateCommand
and a provider factory via DbProviderFactory.CreateParameter
should have different lifetime policies, or it's a breaking change.
That's not the same. Finalization is a runtime concept (i.e. involving the GC), whereas disposability is an API contract.
Yes, I know it. I mean that for short living objects it would be near the same in lifetime. If it's not a solution, then parameters created using a connection via DbConnection.CreateCommand and a provider factory via DbProviderFactory.CreateParameter should have different lifetime policies, or it's a breaking change.
Any change of parameter lifetime here is a breaking change, regardless of whether the parameters came via DbConnection.CreateCommand or DbProviderFactory.CreateParmeter... We already have code other which uses these instances, and they may be assuming they can keep them alive even after the connection has been disposed. However, that doesn't mean we necessarily shouldn't do it...
By the way, this is a completely provider-specific decision - there's no spec/documentation saying anything about lifetimes. Apparently on SQL Server once you've used a parameter instance with an executed command, you can't reuse it any more... Of course we can start making recommendations at the ADO.NET level, but all the providers (and user code) are already out there...
By the way, this is a completely provider-specific decision - there's no spec/documentation saying anything about lifetimes.
Then let's start disallowing to use the same parameter for multiple commands. It looks more consistent.
Of course we can start making recommendations at the ADO.NET level
It would be better to ask maintainers of other drivers first.
Then let's start disallowing to use the same parameter for multiple commands. It looks more consistent.
Should we pre-loop in other micro-/ORMs relying on Npgsql?
By the way, this is a completely provider-specific decision - there's no spec/documentation saying anything about lifetimes.
Then let's start disallowing to use the same parameter for multiple commands. It looks more consistent.
The situation is apparently worse - I heard you can't use execute the same command twice with the same parameter instances... If it's true, I don't think that makes any sense at all and I definitely wouldn't want us to align to that :)
Of course we can start making recommendations at the ADO.NET level
It would be better to ask maintainers of other drivers first.
For now, this issue is only about considering this in Npgsql itself. However, as it seems like a pure perf benefit with no downsides whatsoever - beyond breaking users who rely on being able to work with disposed objects - it seems to make general sense (in any case there are no ADO.NET recommendations or guidelines on anything really). But of course we can discuss.
You can execute a SqlCommand
multiple times with the same parameter instances. There is no such restriction you heard about.
Maybe I misunderstood and it's about moving SqlParameter from one command to another.
Highly probable that perf-wise this doesn't work - multithreading sync related to the pooling typically outweighs the perf benefits from pooling small objects.
The efficiency of managing a pool under highly concurrent usage is still a concern, but perhaps this is worth revisiting now that AddNpgsqlDataSource
makes it highly likely that all NpgsqlConnection
objects are created via a factory method?
Yeah, true.
Npgsql already holds a single NpgsqlDataReader instance per (physical) connection, and continuously reuses that; this works especially well since only one active resultset is supported on connections at a given time. This does have the slight disadvantage that if user code mistakenly accesses a reader after disposing it, that may result in undefined behavior (as that reader instance may already be recycled and in use in another thread) rather than an ObjectDisposedException. But it seems right to improve perf here rather than guard against user error.
Npgsql also does something similar with the command returned by NpgsqlConnection.CreateCommand(): if it's disposed and CreateCommand is called again, the same instance is returned; if it isn't, a new command is instantiated ("opportunistic recycling"). We can indeed do the same with NpgsqlConnection instances returned from NpgsqlDataSource - I've opened #4853 with some thoughts to track this.
I think the only thing remaining after that is NpgsqlParameter; while these likely cause quite a bit of GC pressure, they're also not disposable (see discussion above) and so it's not clear how we would do this. We should probably investigate a lighter parameter API for ADO.NET in any case, which would also probably help with GC pressure here.
There are various factory methods in ADO.NET: all methods on DbProviderFactory, as well as several others (
DbConnection.CreateDbCommand()
,DbCommand.CreateDbParameter()
. We should make these recycle instances rather than instantiate new ones, reducing heap allocations, by using pools behind the scene.Some notes:
/cc @YohDeadfall @austindrenski @bricelam @divega @ajcvickers