Open franciscojunior opened 9 years ago
Well, yes, but then both queries will run in the same transaction. Is that a problem? But how do we know how many rows we should fetch?
Why don't EF simply open another connection? That seems the most reasonable thing to do...
Well, yes, but then both queries will run in the same transaction. Is that a problem? But how do we know how many rows we should fetch?
Today Npgsql pass a value of 0 to the Execute message to indicate that all rows should be returned. We could add support to enable user to specify a value for that. Or maybe we could add a connection string parameter specifically to say if MARS should be supported. This way we could specify a value or let the user specify one.
Regarding the transaction support, it is a good question. I don't know which semantics sql server uses. If it lets you specify another transaction or not.
Why don't EF simply open another connection? That seems the most reasonable thing to do...
That's a very good question. One of the reasons I can think of is that the second query has to run inside the same transaction as the first one.
If it's not implemented in the ADO.NET provider then yes, you should be able to work around it by opening multiple connections in the EF provider (assuming transactions can be distributed over multiple connections since the unit of work may not yet be committed).
If it's not implemented in the ADO.NET provider then yes, you should be able to work around it by opening multiple connections in the EF provider (assuming transactions can be distributed over multiple connections since the unit of work may not yet be committed).
I think we won't be able to use this work around because we can't distribute the transaction over multiple connections :(
Isn't that what https://github.com/npgsql/npgsql/wiki/User-Manual#systemtransactions-support is supposed to solve?
Isn't that what https://github.com/npgsql/npgsql/wiki/User-Manual#systemtransactions-support is supposed to solve?
Hmmmm, that's true. I've completely missed that. I was only thinking about postgresql support itself.
If I recall correctly, I think this is only supported when running on windows. But I think this could be a possible solution.
Maybe @jbcooley can give us more information about it.
A bit late to the discussion but here's what I think...
We previously discussed providing MARS support in #312. It's definitely possible to do something with cursors, like @franciscojunior suggested above, but it's anything but trivial. The main point seems to be performance - once you use cursors, you're doing a roundtrip to the server every time you want another result row (it's possible to get several rows and to cache them, but what happens with SequentialAccess mode...). Like @Emill said, PostgreSQL supports only one transaction at a time so concurrent commands in different transactions isn't possible.
@franciscojunior, Npgsql does provide distributed transaction support, and I'm pretty sure it works well on mono (I even submitted a mono patch some time ago). There's definitely some cleanup and work to be done there, I was thinking of working on that after the big 3.0 rewrite.
Since MARS isn't a standard ADO.NET feature (but rather an SqlClient/SqlServer feature), it may be worth checking the EF7 provider approach in other database providers - how does the Mysql, Sqlite, Oracle drivers approach this questions?
In any case, for the purposes of EF7 I'd definitely assume there's no MARS support and go with multiple connections (if that's actually necessary - can we maybe have an example?). Assume that distributed transactions will work, and if you find any issues we'll fix them.
@franciscojunior, Npgsql does provide distributed transaction support, and I'm pretty sure it works well on mono (I even submitted a mono patch some time ago). There's definitely some cleanup and work to be done there, I was thinking of working on that after the big 3.0 rewrite.
Ahh, that's great, @roji ! I wasn't aware of this. Thanks for the heads up! :)
I don't have much of a stake in this, but I'd prefer to see Npgsql develop features that users wanted rather than limiting the implementation to the ADO.NET interface. There are going to be enough people migrating between platforms that easing migration should be an important factor when considering features.
I think a switch to use server cursors would be a useful feature. When this discussion started up again, I read a bit on how SQL Server implements this. They do something similar by processing data in chunks. It's not a free feature, and must be turned on.
With that in mind, I think this feature should be considered and prioritized appropriately rather than discarded as not a feature of core ADO.NET
@jbcooley, I may have been too harsh on this. I don't mean we should discard this feature or any other feature just because it isn't part of ADO.NET. One good example is support for PostgreSQL enums which @Emill and I worked on although they're definitely not part of ADO.NET.
The problem with MARS is that it's really not a trivial feature... I'm not really sure but I think it's going to require quite a bit of thinking to get this working and efficient, without any sort of backend/protocol support like SqlServer has. In particular, we already have a pretty enormous rewrite on our hands for v3.0 even without considering MARS. And since EF7 support seems to be a generally high-priority work item, it would be better if we could support it without the dependency on MARS.
So personally I'm really not against MARS in any way, it's just that I don't think this is the right time to jump in and I'd like to see EF7 support move forward regardless of it...
BTW I somehow thought MARS was now a "standard" feature that doesn't need to be paid for (or turned on). Any light you can shed on exactly how it's implemented (how chunking occurs, how the interaction with SequentialMode exactly occurs) would help.
@roji
There's some good information on MSDN on how MARS works. https://msdn.microsoft.com/en-us/library/ms131686.aspx
I don't believe you have to pay for it, but they do say you have to turn it on in the connection string. "By default, MARS functionality is not enabled."
I believe certain functionality is disabled when MARS is enabled. I suspect SequentialMode would not be strictly followed. This quote leads me to believe it would work like what's being proposed with cursors. "The MARS infrastructure allows multiple batches to execute in an interleaved fashion, though execution can only be switched at well defined points. In addition, most statements must run atomically within a batch. Statements which return rows to the client, which are sometimes referred to as yield points, are allowed to interleave execution before completion while rows are being sent to the client"
Thanks for the info @jbcooley. We should definitely look into this seriously for v3.1.
One potentially negative sentence is "Server cursors should be used for longer lived or large result sets generated by single SQL statements.", i.e. with SqlClient/SqlServer you have the option to either do MARS or do server cursors. This may mean it's not a good idea to implement MARS on top of server cursors in Npgsql (although my understanding of the whole thing is really very bad at the moment).
Regarding the EF7 implementation specifically, even if Npgsql does support an optional MARS feature I'm not sure it would be a good idea to have the EF7 provider built on top of it, as choosing MARS seems to have some specific/special consequences (otherwise it would have been on in SqlClient/SqlServer by default with no special parameters)?
Anyway, let's leave this conversation open for later.
Note: MARS isn't a requirement for EF7 - see https://github.com/aspnet/EntityFramework/issues/1665. I'm not sure how efficient the non-MARS implementation is compared to the MARS implementation, but it's something.
This doesn't mean we shouldn't look into MARS support, just that it's not required for EF7.
Note: if and when we implement MARS in Npgsql, we will need to override and set IsMultipleActiveResultSetsEnabled
to true in NpgsqlDatabaseConnection
.
Hi,
I am having an exception while using npgsql that is exclusive to this provider on EF7 (it doesn't happen with SQLite or SqlServer). It seems like has to do with the lack of MARS. The exception is the following:
error : [Microsoft.Data.Entity.Query.QueryContextFactory] An exception occurred in the database while iterating the results of a query.
System.InvalidOperationException: An operation is already in progress.
at Npgsql.NpgsqlConnector.StartUserAction (ConnectorState newState) [0x00000] in <filename unknown>:0
at Npgsql.NpgsqlCommand.ExecuteDbDataReaderInternal (CommandBehavior behavior) [0x00000] in <filename unknown>:0
at Npgsql.NpgsqlCommand.ExecuteDbDataReader (CommandBehavior behavior) [0x00000] in <filename unknown>:0
at System.Data.Common.DbCommand.ExecuteReader () [0x00000] in <filename unknown>:0
at (wrapper remoting-invoke-with-check) System.Data.Common.DbCommand:ExecuteReader ()
at Microsoft.Data.Entity.Query.QueryingEnumerable+Enumerator.MoveNext () [0x00000] in <filename unknown>:0
at System.Linq.Enumerable+<CreateSelectIterator>c__Iterator10`2[Microsoft.Data.Entity.Storage.ValueBuffer,Microsoft.Data.Entity.Query.QueryResultScope`1[Microsoft.AspNet.Identity.EntityFramework.IdentityRole]].MoveNext () [0x00000] in <filename unknown>:0
at System.Linq.Enumerable+<CreateSelectManyIterator>c__Iterator12`2[Microsoft.Data.Entity.Query.QueryResultScope,Microsoft.Data.Entity.Query.QueryResultScope].MoveNext () [0x00000] in <filename unknown>:0
at System.Linq.Enumerable+<CreateSelectIterator>c__Iterator10`2[Microsoft.Data.Entity.Query.QueryResultScope,Microsoft.AspNet.Identity.EntityFramework.IdentityRole].MoveNext () [0x00000] in <filename unknown>:0
at System.Linq.Enumerable+<CreateSelectIterator>c__Iterator10`2[Microsoft.AspNet.Identity.EntityFramework.IdentityRole,Microsoft.AspNet.Identity.EntityFramework.IdentityRole].MoveNext () [0x00000] in <filename unknown>:0
at Microsoft.Data.Entity.Query.LinqOperatorProvider+ExceptionInterceptor`1+EnumeratorExceptionInterceptor[Microsoft.AspNet.Identity.EntityFramework.IdentityRole].MoveNext () [0x00000] in <filename unknown>:0
This is being thrown when I'm calling:
var result = await _signInManager.PasswordSignInAsync(model.Email,
model.Password,
model.RememberMe,
lockoutOnFailure: false);
where _signInManager is a Microsoft.AspNet.Identity.SignInManager
This bug is only visible on an old and slow Linux machine I am testing the application on (I am using Elementary OS Freya). On my recent and fast Windows machine (with an SSD), the bug is not visible. I am perfectly capable of logging in and using the application. I am not sure if this has to do with the OS or if the much better performance of my Windows machine is hiding the problem.
@ruicraveiro, this smells like an EF7 bug rather than an Npgsql. EF7 is supposed to be 100% compatible with providers that don't support MARS (like Npgsql), but it seems you stumbled upon a race condition where that's not the case.
Can you please open a separate issue for this and I'll try to take a quick look? Posting Npgsql debug and trace logs as described here might help isolate the problem.
@roji, thanks to the tip on logging. I've just created issue #788 with the exception and the log.
@roji Is the MARS support still in the plan?
@tommy-grovnes definitely not for the upcoming 3.2 release...
Even in the longer term, PostgreSQL doesn't really provide facilities for implementing MARS... Cursors could be a way to go, but they only work within a transaction. Basically don't hold your breath, although if enough people show their interest in this thread I'll look into it.
Good to know, I'll focus on other options for now, thx @roji Do count me as interested though ;)
@roji Is the MARS support still in the plan?
I use FirebirdSQL (https://github.com/cincuranet/FirebirdSql.Data.FirebirdClient), and I will migrate to PostgreSQL, but my code use MARS a lot...
UPDATE: I downgrade to 2.2.7 and active PreloadReader in NpgsqlConnectionStringBuilderm it's ok for my app
@marcosgerene MARS probably won't be implemented anytime soon.
@roji
Example (poor):
class SalesDAL
{
private NpgsqlConnection connection;
....
public SalesDAL(NpgsqlConnection conn= null)
{
if (conn == null)
{
conn = new NpgsqlConnection(...);
conn.Open();
}
this.connection = conn;
}
public List<Sale> ListComplete()
{
using (var rd = new NpgsqlCommand(query, this.connection))
{
while (rd.Read())
{
var sale = new Sale();
sale.Custumer = new CustumerDAL(this.connection).GetById(int.Parse{rd["sale_id"].ToString()); //Here, the new connection in a big list of sales can exced the maxpool
...
}
}
...
}
....
}
class CustumerDAL
{
private NpgsqlConnection connection;
....
public CustumerDAL(NpgsqlConnection conn= null)
{
...
}
public Custumer GetById(int id)
{
....
}
}
How can I solve this?
@marcosgerene, I don't think anything in your code sample that's related to MARS support... MARS is about having multiple commands running simultaneously on the same connection, each with its own, open reader.
It's also pretty bad practice to open a connection and then leave it, owned by your SalesDAL instance, doing nothing until someone comes along and called ListComplete()
. Thanks to connection pooling, connections are extremely lightweight objects, and are better off being instantiated and opened only when needed (i.e. inside your ListComplete()
method).
@roji The above example is from a call (select * from custumers where id = @id) that run sale-to-sale.
Like this:
using (var rd = new NpgsqlCommand(query, conn).ExecuteReader())
{
while (rd.Read())
{
using (var rd2 = new NpgsqlCommand(otherquery, conn).ExecuteReader())
{
rd2.Read();
...
}
...
}
}
If I do not use a single connection in this process, I have to open a new connection to run rd2... so I read the post, it's MARS?
It's also pretty bad practice to open a connection and then leave it, owned by your SalesDAL instance, >doing nothing until someone comes along and called ListComplete(). Thanks to connection pooling, >connections are extremely lightweight objects, and are better off being instantiated and opened only when >needed (i.e. inside your ListComplete() method).
I don't use the connection in objects DAL, but I use a Connection class for it, it's a "poor" exemple
You're still not explaining exactly what you're trying to do and what your queries do... Is there any specific reason you need to run the two queries at the same time rather than simply running the first, reading the results, and then running the second? Even if you must have the two in parallel (but make sure of that), in most situations you can simply use two connections (again, pooling makes connections very lightweight). This should cover most cases.
Is there any specific reason you need to run the two queries at the same time rather than simply running the first, reading the results, and then running the second?
No, the code is OK in FirebirdSQL Ado.Net provider, so I read this insue and maked a question...
Thanks, I'll do it another way.
Hello Roji -- I'm new to PostgreSQL and migrating a SQL Server application that uses MARS. MARS simplifies the design of applications because I can call any code within a loop, including code that opens new result sets. Changing this requires restructuring the original application.
The workaround suggested here is to (1) open new connections for inner queries, and (2) use distributed transactions to commit all atomically.
What happens in this workaround when reading uncommitted changes -- will all the connections in a distributed transaction see each other's uncommitted changes as if they were part of the same transaction?
Welcome to PostgreSQL.
First, I really recommend that you avoid distributed transactions if at all possible. They are definitely not meant as a way to write your application, but rather if you genuinely need consistent data across multiple databases. There are significant performance costs (and complexity costs) associated with distributed transactions. Also, as you correctly say, different connections in a distributed transactions do not see each other's uncommitted changes - changes become visible only some time after the distributed transaction has committed (since two-phase commit is asynchronous). However, in general most cases I've seen with migrations from MARS are concerned with queries, whereas (distributed) transactions are more about updates, which are "instantaneous" and therefore usually don't need to occur at the same time as other operations (and therefore requiring MARS).
So usually it's a matter of restructuring the application to not use multiple commands concurrently - usually this implies buffering the first resultset in memory and closing the reader before moving on to the 2nd query. The alternative as you said is to use multiple connections which may add load to the database and does not reflect uncommitted data across the connections.
Thank you for the quick reply! Buffering the 1st query before moving to the 2nd will work in 99% of the cases (after investing on restructuring the code). The difficult case is when the loop on the 1st query exits depending on logic applied to data from the 2nd query, and the 1st query is potentially very long. Are cursors an option here?
Good to hear this is helpful. I'm of course assuming that you can't inverse the order of the queries, so that you have the answer from the 2nd query before sending the 1st (which would solve everything).
If you really need to read output from a 1st query based on output from a 2nd, then yes, you can use cursors - this is a sort of way to do MARS. You would run the first query via cursor, read as many rows as you need/want, execute the 2nd query, and then conditionally continue to run the 1st or terminate it. See the PostgreSQL docs for cursors to see how that can be done.
This would be done as part of #1982 (pipelining) if we decide to go ahead with it.
Hello, How can Entity Framework6 use Npgsql to support concurrent transactions? Do you have any reference from demo? Thank you.
The PostgreSQL database only supports one transaction at a time. If you mean multiple commands running at the same time, then this is MARS (multiple active resultsets), and that's not currently supported.
For anyone interested in this, there's a good chance this will be implemented as part of #1982... Multiplexing means that multiple consumers would be able to send queries on the same physical connection, and consume results in arbitrary order. This means we would be caching entire resultsets in memory, which means we can support MARS. Not 100% finalized yet but a good future direction.
Party time! Makes PostgreSQL more like Oracle or SQL Server!
@roji when you say caching resultsets, do you mean fetching the entire resultset from the server? or would records be fetched from the server as the client requests them? For example, let's say I use MARS to access a table that has a million records, and the client requests the first 3-10 records and then abandons the query -- would that force the library to pull and cache the million records?
@pablocar80 I suggest you read #2111 (even if it's a bit long).
Yes, if we end up going down this road (and I'm still not sure we will), Npgsql will be buffering the entire resultset from the server. FYI even today, if a query is abandoned after a few records, Npgsql still has to consume the entire resultset from the server, because it is already being written to the socket on the PostgreSQL side. You can still explicitly pull only a certain number of rows by using cursors, but Npgsql will not do that for you automatically (again, #2111 contains more information).
Note: @MaceWindu investigated which ADO.NET providers support MARS, the results are here: https://github.com/linq2db/linq2db/pull/2728 (originally posted in https://github.com/npgsql/npgsql/issues/1982#issuecomment-756730751).
Here are some notes on MARS and why I'm not sure it belongs in Npgsql. tl;dr unlike SQL Server, PostgreSQL simply doesn't have a mechanism that would allow an efficient MARS implementation.
One aspect of MARS is to make it easier for users to write ADO.NET code: users can execute multiple commands without caring or tracking if the connection is already in use. This is purely a convenience thing - it's always possible to rewrite code so that it either (a) uses separate connections, (b) buffers the entire resultset before execute another command, or (c) uses PostgreSQL cursors and explicitly fetching results for the different commands.
Beyond convenience, another aspect of MARS is performance. To the best of my understanding, when using MARS on SQL Server, the multiple resultsets are streamed back from the server without any additional roundtrips being made by the client, as is required when using PostgreSQL cursors (e.g. FETCH NEXT
). Considering the perf importance of roundtrips, this difference is really essential - an application doing MARS on SQL Server isn't doing any additional roundtrips, whereas a cursor-based MARS against PostgreSQL would be adding a lot of roundtrips. However, if users do want this behavior on Npgsql, they can use cursors themselves and accept the extra roundtrips.
The same is true if MARS is implemented by buffering the entire resultset - users can do it themselves if they wish, but that has a potentially large cost in terms of memory usage.
To summarize, given that PostgreSQL itself doesn't have an efficient MARS implementation (AFAIK), it doesn't seem right for Npgsql to implicitly do MARS under the hood, with considerable perf costs that would be implicit - unsuspecting users would be falling into a perf pit of failure.
I realize ngpgsql is not a wrapper for libpq
, but is there anything about Pipeline Mode that would help guide an implementation?
@ryan-morris libpq's pipelining mode seems similar to what Npgsql calls batching, which allows you to send multiple statements in a single roundtrip, without waiting for earlier ones to return before sending later ones. You can do this by specifying multiple statements in the command text, separated by semicolons (the upcoming Npgsql 6.0 also supports the new ADO.NET batching API, which still needs to be documented).
Hi,
Ran into this issue while I tried to port DevExpress XAF application to PostgreSql, in fact, DevExpress support staff provided link to this this issue in this support ticket.
So, I have a scenario where I do not have any influence over queries and XAF can hardly rewrite entire framework which they based on MARS, so naturally, we all try to "poke the middle man" :-)
I understand performance issues and possible kick back where you implement performance degraded solution and then people start complaining about it later.
However, when I ask my customers if they want a performance degraded solution or no solution at all - they always go for performance degraded solution. Obviously, this is my use case scenario and range of applications where does not matter at all if some operation takes 2 or 4 seconds, but I would guess that most apps are like that and only a small percentage of apps are truly performance critical - and those would surely not use MARS, ORMs and app frameworks, but would optimise at every step.
Also, such solution can be documented and even logged when it is enabled to inform users that MARS is performance degraded solution to avoid getting MARS performance related issues spawning here.
I am not saying you should implement this, I believe a proper place for that fix is within XAF, however, XAF is grown into a large and complex and stable framework and such change would require significant refactoring and testing which can hardly ever be justified from commercial perspective.
Me, as a developer, do not have any power regarding queries since they are handled by XAF. I choose XAF long ago and it proved to be good framework for business apps, so I also built very large and complex financial app with no ability for rewrite without XAF.
And now I have several customers who prefer PostgreSql and saying "You can use the app, but you will get this random Command already in progress errors, just ignore them and repeat action" ... is not a best selling point. It would be MUCH preferred that instead of those errors, an action takes a bit longer and completes successfully.
This is why I am exploring the possibility that you do build non-perfect solution for MARS.
Thanks, Mario
@mblataric I can see a future where we allow you to execute cursor-driven commands, or possibly even just entire resultset buffering; the former would be complicated to do and add lots of database roundtrips, while the latter would be easy, not add roundtrips, but be very problematic in terms of memory usage.
However, we generally don't see enough people getting stuck on this (yours is the first comment in almost 2 years). Yes, it's a problem for people porting over code from SQL Server, but they generally get over it; porting an app from one database to another isn't transparent, and there are various other things that have to happen anyway.
Another potential argument against doing this, is that if people could use it to "get rid of the problem", they'd probably just do it rather than changing their code, and end up with an inferior/problematic behavior; today, they have to do the "right thing", even if it costs more effort. I know this isn't a very strong argument (i.e. people should only need to deal with the perf issue if they actually have a perf issue), but I do think it stands.
More concretely, has there been a conversation with DevExpress about making XAF work on databases that don't support MARS? PostgreSQL is definitely important enough that they'd probably care; that would be the best path forward here.
@roji I can not say we are in a disagreement, normally, I would rewrite my code if I wanted to change the underlying database, however, I simply do not have that choice in this case.
There has been a discussion here and here and an official documentation page here
In short, they test their framework with SQL Server only and while framework itself does support other databases, it's more on a "you'll have to fix any quirks yourself". There are not many of such quirks, but I am simply very lucky guy and usually run in most of them :-)
Also, there is an issue with relying on written feedback which I noticed at DevExpress as well - which is why they might not consider PostgreSql as high priority as Sql Server. People are weird. Yes, if they have an issue with something it is reasonably to expect they would at least devote time to write about an issue - but they do not. I typically write too much, but I have this issue for several months now and it is only now that I decided to leave a feedback here. Granted, I paused the project for some time and I am on a vacation at the moment, so I took the opportunity, but still.
Anyway, thanks for input, I guess I'll be stuck with this one for a while.
@mblataric these docs which you linked to above seem to provide a reasonable workaround (add AutoInclude), no? Any reason you can't just do that?
@roji I wish I could. It is not only AutoInclude, you need to hard-code all navigation sub-properties which you want to use. This breaks ability for my customers to pull data they need into grid at runtime and I would have to change application each time they would want a different set of data, combine that with several dozen customers that have such requirements. Also, I have 350 business classes with over 1000 navigation properties ... so try to put that workaround into my perspective :-)
[...] you need to hard-code all navigation sub-properties which you want to use
Not sure why hard-coding would be necessary - you could write dynamic code which discovers those navigations, makes them AutoInclude etc. Just trying to think of creative ways to get around this.
Hi all!
I'm opening this issue so we can discuss about how we could implement MARS in Npgsql.
@bricelam gave us a heads up about the requirement of MARS by Entity Framework 7. https://github.com/npgsql/npgsql/issues/249#issuecomment-71501637
I was reading the protocol documentation and there it says:
Would it be possible to have many portals open at the same time and send execute messages separately?
Something like this: