Open Ephron-WL opened 1 year ago
Sorry, this is a bit hard to follow. Do you mind creating either a fiddle or a small project? Here is an example of a fiddle: https://dotnetfiddle.net/RtYk51
Yes, I understand. I have created an example using your fiddle. However, I'm having a hard time putting a query together that can't be optimized that matches my complex query above.
If we were supporting CTEs, I'd have created a CTE to obtain the row numbers using your window functions and then I would have selected the CTE result constraining it by the row number. Without a CTE, this can be performed using a join to a table constructed by a query. While simple in SQL, it is more complex in LINQ. We have to join the query that would be in a CTE to itself. Under the circumstances in my original post above you actually see a SQL join with a table created from a query. This was my intented illustration. That query-generating table (I don't know the technical term for this) should be calculating the RowNumber column and some other columns that can be used to perform the join, etc. Then, I should be able to constrain my final results using the RowNumber in the where clause. The query SELECT RowNumber =... FROM TABLE WHERE RowNumber <= 2; is not valid, because you can't constrain on a window function or its result in the same query, which I suspect you know. So, you have to layer your query with a CTE or table-generating query.
My simple example below is being optimized and the joined table-generating query is being simplified away. There were two problems with the example above: 1) the select clause of the inner table-generating query is missing the window function, i.e. it was not translated because it was in table-generated query that was part of a join and the translator didn't account for it and 2) the window function was placed into the where clause, which of course is not valid SQL. In the sample below, only #2 occurs because of optimizations. This is understandable because the translation engine is thinking that any expression representing a column value can be placed into the where clause, which is generally true, except for window functions. Of course, EF Core wasn't developed with window functions in mind.
I'd be happy to work with you on trying to apply this fix, time permitting. However, being more familiar with the library, if you can think of how to quickly apply a fix or improve the outcome, maybe you are the better person to do so.
var query2 =
(from testRow in context.TestRows
join testRow2 in (
from subRow in context.TestRows
select new {
subRow.Id,
RowNumber = EF.Functions.RowNumber(EF.Functions.Over().OrderBy(subRow.Date).PartitionBy(subRow.Col1))
}
) on testRow.Id equals testRow2.Id
where testRow.Id > 1 && testRow2.RowNumber <= 2
select testRow2.Id);
Console.WriteLine(query2.ToQueryString());
... produces...
SELECT [t0].[Id]
FROM
[TestRows] AS [t]
INNER JOIN [TestRows] AS [t0]
ON [t].[Id] = [t0].[Id]
WHERE
[t].[Id] > 1
AND ROW_NUMBER() OVER (PARTITION BY [t0].[Col1] ORDER BY [t0].[Date]) <= 2;
What we really need to see is something like this:
SELECT [t0].[Id]
FROM
[TestRows] AS [t]
INNER JOIN (
SELECT
Id,
RowNumber = ROW_NUMBER() OVER (PARTITION BY [t0].[Col1] ORDER BY [t0].[Date])
FROM
[TestRows]
) AS [t0]
ON [t].[Id] = [t0].[Id]
WHERE
[t].[Id] > 1
AND RowNumber <= 2;
I think the translator has to see the window function as an expression it can't optimize away, like it normally would with any other expression. By selecting the entire joined table row it selects the window function, which while basically valid, is of no use in this context and really must be abstracted into another the inner table.
var query2 =
(from testRow in context.TestRows
join testRow2 in (
from subRow in context.TestRows
select new {
AnotherId = subRow.Id/2,
Id = subRow.Id,
RowNumber = EF.Functions.RowNumber(EF.Functions.Over().OrderBy(subRow.Date).PartitionBy(subRow.Col1))
}
) on testRow.Id equals testRow2.Id
where testRow.Id > 1 && testRow2.RowNumber <= 2
select testRow2);
... producers...
SELECT [t0].[Id] / 2 AS [AnotherId]
,[t0].[Id]
,ROW_NUMBER() OVER (PARTITION BY [t0].[Col1] ORDER BY [t0].[Date]) AS [RowNumber]
FROM
[TestRows] AS [t]
INNER JOIN [TestRows] AS [t0]
ON [t].[Id] = [t0].[Id]
WHERE
[t].[Id] > 1
AND ROW_NUMBER() OVER (PARTITION BY [t0].[Col1] ORDER BY [t0].[Date]) <= 2;
You can use AsSubQuery()
from Thinktecture library. It works great, here is the fiddle: https://dotnetfiddle.net/dkZoV0
Steps are:
Add nuget package Thinktecture.EntityFrameworkCore.SqlServer
Register with .AddCustomQueryableMethodTranslatingExpressionVisitorFactory()
Wrap the use of Window Function with AsSubQuery()
The question is: should this library handle a case like this without forcing users to install another extension...
I'm going to see if I can implement subquery-with-window-function detection after a join (inner) and then construct the SQL accordingly. It is already taking me some time. I'll let you know how it goes.
Inner joins are created in the SelectExpression. It is not practical to try to supplement or replace the code in that class.
I do think Pawel's approach is probably best. I was able to integrate Pawel's AsSubQuery pattern into your library so that you will not have any dependencies on his library. Is that ok with you? Maybe you can give him credit in your readme. Also, I think you should add to the readme how to use the method since it allows users who do not already know how to use a single query to constrain their window function and that should be more performant than two calls.
I didn't create a proper unit test. The one I created was only for debugging. Some of the type and member names could be improved upon.
I synced my fork and see that I'm having difficulty with the SqlServer assembly. My thoughts follow:
Get rid of support for .net7.0, leave .net6.0, but upgrade the dependency to Microsoft.EntityFrameworkCore.SqlServer 7.0 and then upgrade your release to reflect the new version of the dependency. That way you are supporting .net6 and 7 target frameworks and the newer SqlServer library. Since you are upgrading your version to account for the breaking changes in EF Core Sql Server, you should be good. You should be able to do a minor version upgrade since no public APIs change, only internals/dependencies. I think. I'm not positive how versions are supposed to change when required dependencies change.
I think you can upgrade your EF Core and Relational dependencies to 7 as well. I have been using them .net6 without an issue.
You should only want to support .net7.0 when you are actually coding in C#11 or you want any SDK benefits of .net7. But, by doing so you may potentially require users to also upgrade their supported runtime, which may not be necessary at this time. The dependency of Microsoft.EntityFrameworkCore.SqlServer 6 does not correspond to .net 6 target framework and Microsoft.EntityFrameworkCore.SqlServer 7 does not correspond to .net 7 target framework. These target frameworks are not dependencies of each version of Microsoft.EntityFrameworkCore.SqlServer, but that is what is implied by your configuration. In fact, Microsoft.EntityFrameworkCore.SqlServer 6 and 7 are merely their own versions independent of the target framework. Microsoft.EntityFrameworkCore.SqlServer 7.0 is currently targeting .net6, so that is the target framework you should be supporting for best backward compatibility. Microsoft.EntityFrameworkCore.SqlServer is being upgraded with breaking changes and so your assembly should also be upgraded with this new dependency.
I couldn't get your projects to compile properly with your configuration, otherwise I wouldn't mention it here.
I think you have done an amazing job in developing this library. Before deciding upon your library I'd looked into linq2db.efcore and, like you, found their design problematic and, like you, I wanted an extension of EF Core. So, I'm grateful you created this library and that you are still servicing it.
Semantic Versioning 2 Dependency Change Version Handling: https://github.com/semver/semver/issues/148 https://softwareengineering.stackexchange.com/questions/440018/semantic-versioning-on-a-dev-dependencies-change https://stackoverflow.com/questions/53505965/semantic-versioning-and-dependency-changes
I'm not convinced .AsSubQuery()
method should be copied here. The two reasons are:
Let me look into it a bit more and I agree with you, a use case like this should be in the README regardless of whether AsSubQuery
used is copied or not.
I didn't reuse the namespace.
The truth is that your library is a good compliment, as is, to Thinktecture.EntityFrameworkCore and vice versa. So, if they are both dependencies of the same project then there would be no need to have the added feature.
I should have said "name collision" which makes it more relevant since global usings has been introduced.
Same problem here.
Thinktecture is not available for PostgresSql, so in my opinion this library should not be related to others. Is there any workaround for Npgsql?
I think I copied in the subqiuery logic from Thinktecture and submitted a PR, which was rejected by the library author because he considered it outside of the scope of the project. Take a look at my PR.
SubQuery()
is not a good solution. I actually asked how a SubQuery should be implemented in Linq at .Net Data Community standup and was shutdown. Library / EF Core users shouldn't be aware any of that and it should be performed behind the scenes.
@john-bartu, been working on a solution to have that working automatically, almost got it, but got stuck and then distracted. Will try to finish this week. Sql Server users had a workaround, so it wasn't overly urgent until now. If you can't wait, use @Ephron-WL's fork.
If you found a way to implement the subquery without having to call a subquery function explicitly, then I salute you! 🏆🏆🏆
I might also add that I don't agree with developers shutting you down for suggesting the use of a subquery function. I mean, sure, we all want the system to do work for us without having to give it hints, e.g. there is implicit handling of the condition, but you and I both researched the possibilities and couldn't find any solution. The best was the use of the function, which I think is entirely legitimate and especially a minor implementation detail. There is the ideal, which many of us strive to obtain, and then there is what is practical and works now after an ideal was found not to be attainable. If you have figured out how to handle it implicitly in your library, then you are one major step ahead of my understanding and skillset in the matter and I commend that.
@virzak Thanks! Great I look forward to it.
In the meantime, I tried to write my own translation service to add one function EF.Functions.CountAll()
just adding an additional column count(*) over()
. Because this is the only function I need for now.
But unfortunately, so far without success.
Almost there guys.
Ist there any progress on this? In my opinion the AsSubQuery
method is a pretty good and elegant solution in the case of window functions. Justification: When someone uses window functions they are already on the conceptual level of SQL specifics and window function return different values depending on where in my query/join I put them (if there are filters anywhere), so I want to be able to deliberately decide where (in which query/subquery) my window functions are placed.
I would really need the DENSE_RANK
function in one of my projects and I need to be able to put it in a joined SQL subquery. Thinktecture doesn't support DENSE_RANK
and this project doesn't support AsSubQuery
and I really don't want to include both libraries... Is there any way out of this dilemma?
@InspiringCode , this weekend. Promise. If I won't get it this weekend, I'll make the PR public and ask for help. It is almost there, but doesn't work for some reason. There was a lot of work this holiday break on this project, but didn't get to this issue. Feel terrible for feeding people promises, but we're almost there.
Apologies to making you wait for a year, but this is now in a new beta.
Usage:
ctx.TestRows.Where(t => EF.Functions.DenseRank(EF.Functions.Over().OrderBy(t.Id / 10)) == 2)
There are currently 3 basic tests that are passing, but I'm sure more work will be needed.
This works for PostgreSQL, SQLite and SQL Server providers.
These tests aren't enough to cover my client project which was the driver for creation of this library, so I'm excited to add more tests very soon.
Feel free to submit PRs with failing basic tests of your own.
The approach for this was to rewrite expression:
From:
{[Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression].Where(t => (__Functions_0.Max(t.Id, __Over_1.PartitionBy((t.Id / 10))) == Convert(23, Nullable`1)))}
To:
{[Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression].Select(o => new MyAnon() {Original = o, P0 = __Functions_0.Max(o.Id, __Over_1.PartitionBy((o.Id / 10)))}).AsSubQuery().Where(w => (w.P0 == Convert(23, Nullable`1))).Select(b => b.Original)}
before query execution.
The starting point is here: https://github.com/zompinc/efcore-extensions/blob/ce835fbf3de012d6c92e510864ef1ab4942b7f77/src/Zomp.EFCore.WindowFunctions/Query/Internal/WindowFunctionsRelationalQueryTranslationPreprocessor.cs#L15-L17
The visitor:
AsSubQuery()
method which is internal. It signals to call .PushdownIntoSubquery. Almost identical to Thinktecture's AsSubQuery, except internal.If anyone has an opinion on the approach, I'd really appreciate. For example, it might be better to figure out if PushdownIntoSubquery
should be invoked from SqlExpressions visitors.
cc: @roji
Thank you for your efforts @virzak!
Please consider the following query (which is a very simplified version of my production query):
public class Employee {
public Guid Id { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public decimal Salary { get; set; }
public string Manager { get; set; }
}
var employeesWithRank = db
.Set<Employee>()
.Select(e => new {
EmployeeId = e.Id,
Rank = EF.Functions.RowNumber(e.Department, EF.Functions.OrderByDescending(e.Salary))
});
var query = db.Set<Employee>()
.Where(x => x.Manager == "John")
.Join(employeesWithRank,
e => e.Id,
r => r.EmployeeId,
(e, r) => new { Id = e.Id, Name = e.Name, SalaryRank = r.Rank });
It should return all employees whos manager is John with their salary rank (but their rank within their WHOLE department and NOT the rank within all subordinates of Jonn!).
So the expected query would be:
SELECT [e].[Id], [e].[Name], [t].[Rank] AS [SalaryRank]
FROM [Employee] AS [e]
INNER JOIN (
SELECT [e0].[Id] AS [EmployeeId], DENSE_RANK() OVER(PARTITION BY [e0].[Department] ORDER BY [e0].[Salary] DESC) AS [Rank]
FROM [Employee] AS [e0]
) AS [t] ON [e].[Id] = [t].[EmployeeId]
WHERE [e].[Manager] = N'John'
But the returned query is:
SELECT [e].[Id], [e].[Name], DENSE_RANK() OVER(PARTITION BY [e0].[Department] ORDER BY [e0].[Salary] DESC) AS [SalaryRank]
FROM [Employee] AS [e]
INNER JOIN [Employee] AS [e0] ON [e].[Id] = [e0].[Id]
WHERE [e].[Manager] = N'John'
Which is simply wrong. How can I solve this problem with your library?
@virzak I'm interested in what you're doing here, but unfortunately I don't think I'll have time to look at the code in the near future. I will say that I'm currently working on various query architecture improvements that should hopefully make something like this much easier to do (making the SQL tree much more open to arbirtrary external manipulations, making it fully immutable...). CTE support is also high on my mental list, though this may not be done in time for 9.0 (we have a lot of planned work).
@InspiringCode,
Looks like that the work which was done for Where needs to be applied for Join as well. If a Window Function is detected inside a join - that's when PushdownIntoSubquery
should be performed automatically for you.
The query you're currently getting can be available can be achieved using two methods (current 1 and current 2). One of them can be modified.
@virzak But when I do it as you proposed in your dotnetfiddle, I would still have to add a reference to Thinktecture which I want to avoid. For this reason I proposed to make AsSubquery()
public.
You would probably also have to perform PushdownIntoSubquery
when a window function is applied within a subselect, and maybe there are quite a few other scenarios where the same applies:
var query = db.Set<Employee>()
.Where(e => e.Manager == "John")
.Select(e => new {
Id = e.Id,
Name = e.Name,
SalaryRank = db
.Set<Employee>()
.Select(r => EF.Functions.RowNumber(r.Department, EF.Functions.OrderByDescending(r.Salary)))
.Single(r => r.Id == e.Id) })
@InspiringCode, I didn't save the fiddle, that's the source of confusion. Please look again. We'll make current 1 to produce exact SQL of desired.
@InspiringCode there you go:
https://dotnetfiddle.net/32T5Ux
No thinktecture and no .AsSubQuery(). Also desired Sql.
Still lots of tests to run, but this is the concept.
@InspiringCode, for your subselect could you provide a fiddle link?
@john-bartu @Ephron-WL really sorry for making you wait guys. The fix is in in the latest beta along with window functions inside a where
clause as well as nested window functions. The AsSubQuery
is automatic and isn't required for the cases in this issue.
Let me know if this works well for you.
@virzak Thank you for your time. I will have a look at it this week and report back to you with the results.
@virzak Thank you for your efforts!
Regarding the subquery example: I thought about it in more detail and it is indeed hard to come up with a meaningful example. You would need to have double nested sub queries but in this case a Join is probably simpler and cleaner. So I guess we can forget about it for the moment.
@InspiringCode There is a similar issue with a subquery in the where clause, but it can also be solved with a join.
@virzak
Unfortunately for our production case, this still does not work.
I will try to prepare a test case, or at least a fiddle showing my usage in which this does not work.
In the meantime I'll just say that we are trying to use "EF.Functions.Count(EF.Functions.Over())" to determine the number of total rows before pagination but after filtering.
We have something like:
_context
.Where(.....) // huge filtering
.OrderByDescending(e => e.TimeModification)
.Select( new EventRes { Id = id, Count = EF.Functions.Count(EF.Functions.Over()))
.Skip(filter.Offset).Take(filter.ItemsPerPage)
.Join(...) // many joins
The problem is, that the select containing count(*) over ()
is not considered as SubQuery
.
The count(*) over()
is putted to the top level of query, after skip and take, so it always return TAKE
count.
More or less, our logic is supposed to be this:
We have groups, we filter them by members, order groups, paginate, and then take them with their members. Returning groups with joined members with information about the number of groups after filtering (but before pagination).
I gave this a go with 8.0.19-beta but it didn't solve my case:
SELECT [e].[CattleMustered], [e].[TimeTaken], ROW_NUMBER() OVER(ORDER BY [e].[CattleMustered] DESC, [e].[TimeTaken]) AS [c], [e].[Id], ROW_NUMBER() OVER(ORDER BY [e].[CattleMustered] DESC, [e].[TimeTaken]) AS [c0]
FROM [Entry1] AS [e]
WHERE EXISTS (
SELECT 1
FROM [EntryPlayer] AS [e0]
INNER JOIN [Players] AS [p] ON [e0].[SteamPlayerId] = [p].[Id]
WHERE [e].[Id] = [e0].[EntryId] AND [p].[Name] = @__steamName_2)
ORDER BY ROW_NUMBER() OVER(ORDER BY [e].[CattleMustered] DESC, [e].[TimeTaken])
OFFSET @__p_3 ROWS FETCH NEXT @__p_4 ROWS ONLY
I have a where after the row_number but it's being hoisted inside (it's through a joiner table)
@redwyre,
This is slightly hard to follow.
Can I ask to provide a fiddle with expected SQL? Will try to fix this week.
Thanks
Hi @virzak !:) Any progression being made on this? I'm using the RowNumber for a "nasty" query of mine that breaks in main but works in pre-release. Hoping to see this make the main branch :)
If required I'm happy to show how it works for me, but I can't speak for how it's supposedly not working ref. @john-bartu, & @redwyre .
/T
This change is already in the master branch. Since it works for you, there should be no problem. When everyone else is satisfied and the test cases improve, I'll drop the prerelease tag.
This is an excellent library. I've been seeking window functions in EF for the past 10 years, and finally, I've found a promising initiative here.
WHERE clause on top of window function is working good now on this package
Can we introduce the FIRST_VALUE and LAST_VALUE functions?
I'll also try contribute if I get any free time. This is awesome work. Thanks a bunch for all the contributors.
If I have multiple window functions in the query, it is considering only the first one
var queryWithRanks = query.Select(i => new
{
i,
DenseRank = EF.Functions.DenseRank(
EF.Functions.Over().PartitionBy(i.TicketNumber)
.OrderByDescending(i.PublishDate)),
Rank = EF.Functions.Rank(
EF.Functions.Over().PartitionBy(i.TicketNumber)
.OrderByDescending(i.PublishDate)),
}).Where(x => x.DenseRank == 1).Select(x => x.i);
Rank is completely ignored from the query
Another potential issue Although DenseRank column is pushed inside to the inner query.. the joins are being applied to the outer query. What if I want to use joined table column inside the dense rank partition? This is becoming so complex it appears..
This change is already in the master branch. Since it works for you, there should be no problem. When everyone else is satisfied and the test cases improve, I'll drop the prerelease tag.
@virzak - This approach isn't feasible. How difficult would it be to retain the user's query as-is as an inner query and simply add a new outer query to access the window function columns defined in the inner query for filtering purposes?
I was going to use a CTE for this, but linq2db can't figure out how to handle SQL Server temporal tables.
The query below translates to invalid SQL om efcore-extensions:
Produces this...
Should produce...
Below is the expression tree of the LINQ query, which is a little overcomplicated by calling a number of EF Core methods inline.