scottksmith95 / LINQKit

LINQKit is a free set of extensions for LINQ to SQL and Entity Framework power users.
MIT License
1.63k stars 162 forks source link

Consuming ExpandableQuery w/ Contains(...) leads to multiple Db hits #59

Open TimHambourger opened 7 years ago

TimHambourger commented 7 years ago

Stack: .NET v4.5.2 EntityFramework v6.1.3 LinqKit v1.1.8

In EF, you can compose IQueryables using Contains(...) and the resulting IQueryable will execute in a single Db hit. E.g.:

var query = db.MyModels.Where(m => m.Id < 10).Select(m => m.Id);
db.MyModels.Where(m => query.Contains(m.Id)).ToList();

triggers a query like

SELECT 
    [Extent1].[Id] AS [Id]
    FROM [dbo].[MyModels] AS [Extent1]
    WHERE  EXISTS (SELECT 
        1 AS [C1]
        FROM [dbo].[MyModels] AS [Extent2]
        WHERE ([Extent2].[Id] < 10) AND ([Extent2].[Id] = [Extent1].[Id])
    )

But if the consumed query is a LinqKit ExpandableQuery, this will instead trigger two Db hits:

var query = db.MyModels.AsExpandable().Where(m => m.Id < 10).Select(m => m.Id);
db.MyModels.Where(m => query.Contains(m.Id)).ToList();
SELECT 
    [Extent1].[Id] AS [Id]
    FROM [dbo].[MyModels] AS [Extent1]
    WHERE [Extent1].[Id] < 10
SELECT 
    [Extent1].[Id] AS [Id]
    FROM [dbo].[MyModels] AS [Extent1]
    WHERE [Extent1].[Id] IN (1, 2, 3, 4, 5, 6, 7, 8, 9)

The SQL generated is essentially the same as if you did

var idsList = db.MyModels.Where(m => m.Id < 10).Select(m => m.Id).ToList();
db.MyModels.Where(m => idsList.Contains(m.Id)).ToList();

except that in the latter case, each Db hit is triggered by a separate ToList() call, whereas in the LinqKit case above the same ToList() call triggers both hits.

Also note that the behavior is the same whether or not the consuming query is expandable:

var query = db.MyModels.AsExpandable().Where(m => m.Id < 10).Select(m => m.Id);
db.MyModels.AsExpandable().Where(m => query.Contains(m.Id)).ToList();

triggers the same two Db hits.

See the attached zip file for a VisualStudio solution w/ a minimal repro and an associated SQL Server Profiler trace file.

LinqKitContainsMinimalRepro.zip

I'm working on an EF project where we'd like to be able to use LinqKit in some of our permissions queries. But we frequently consume permissions logic via Contains(...) -- give me all the entities for which the current user has rights and which satisfy some additional filter. Doing this w/ ExpandableQueries produces extra Db hits. Besides the extra hit, we've previously run into performance issues calling Contains on a List as opposed to an IQueryable b/c EF can't use its query plan cache for these queries.

Note that I've tested this w/ LinqKit, not LinqKit.EntityFramework. Are the two functionally equivalent as of v1.1.8?

StefH commented 7 years ago

I guess that the real query execution from the normal example is delayed, until the full query is defined. Only then EF will generate the single SQL statement.

With LinqKit ExpandableQuery, the queries are directly excecuted, so you need two SQL statements.

The solution would be that you write your query different (try to make 1 query) or just create a simple StoredProcedure which does the job.

TimHambourger commented 7 years ago

I guess that the real query execution from the normal example is delayed, until the full query is defined. Only then EF will generate the single SQL statement.

Yes, I gather what's going on is that EF has two behaviors when it sees Enumerable.Contains or Queryable.Contains: There's the default behavior that works for all enumerables, which involves iterating the enumerable and using the SQL IN operator. Then there's an alternative behavior for some special subset of IQueryables, which is to reach into the inner IQueryable's Expression tree and create a combined SQL statement using a subquery.

I don't know enough about EF's internals to know exactly what defines this "special subset." What I can see empirically is that EF's System.Data.Entity.Infrastructure.DbQuery<TResult> is in but LinqKit's ExpandableQuery<T> is out. So EF applies its default IEnumerable behavior, and of course iterating an ExpandableQuery triggers a Db hit.

The solution would be that you write your query different (try to make 1 query) or just create a simple StoredProcedure which does the job.

Maybe it'd help to say a bit more about my use case. We have a number of permissions services that can take a right, as represented by an Enum, and output an IQueryable that represents the complete list of entity ids for which the current user has the given right. E.g.:

IQueryable<int> editableIds = MyRightsService.Scope(MyEntityRights.Edit);

which can then be consumed multiple ways:

var canEditSingleEntity = editableIds.Any(id => id == myKnownId);
var editableIdList = editableIds.ToList();
var biggerQuery = Db.MyEntities.Where(e => editableIds.Contains(e.Id) && /* other filter logic */);

So sure, a rewritten query or stored proc would work, but in our case those would certainly lead to code duplication. Given that LinqKit is all about composing larger queries out of small reusable subcomponents, it'd be nice to see composition via Contains supported as well.

My ideal solution would be to somehow modify ExpandableQuery so that EF recognizes it as a candidate for its combined query behavior. But I have no idea how feasible that is.

Short of that, a reasonable workaround would be to add handling for Contains to ExpressionExpander.VisitMethodCall. Essentially, anytime LinqKit sees Queryable.Contains called on an ExpandableQuery, it should be possible to replace that w/ the Expanded InnerQuery. That should leave an Expression that EF will process normally.

I've gone as far as creating a rough prototype of this approach. It enables doing

var query = db.MyModels.AsExpandable().Where(m => m.Id < 10).Select(m => m.Id);
db.MyModels.AsExpandable().Where(m => query.Contains(m.Id)).ToList();

in one hit. I'd be more than happy to submit a pull request if you'd like.

StefH commented 7 years ago

Some short comment on:

We have a number of permissions services that can take a right, as represented by an Enum, and output an IQueryable that represents the complete list of entity ids for which the current user has the given right.

I would expect a different approach (not return integers, these have no meaning, but real entities), like:

// Books
IQueryable<Book> booksQuery = MyBooksRightsService.Scope(MyEntityRights.Edit);
IQueryable<Book> booksQueryEdit = booksQuery.Any(b => b.Id == myKnownId);
IQueryable<Book> booksQueryBigger = booksQuery.Where(b => b.Title == "abc" && b.Id > 1000);

// Orders
IQueryable<Order> orderQuery = MyOrderRightsService.Scope(MyEntityRights.Edit);
IQueryable<Order> orderQueryEdit = orderQuery.Any(o => o.Id == myKnownId);
IQueryable<Order> orderQueryBigger = orderQuery.Where(o => o.Quantity > 1 && o.Name == "xyz");
TimHambourger commented 7 years ago

Yeah, that's a fair point. I'm not up on the full history of why we do ints and not entities, but I agree that entities might be a cleaner solution.

A couple other thoughts:

  1. I'd like to expand this topic a bit. I'm realizing the problem isn't unique to Contains after all. E.g.

    var myIds = db.MyModels.AsExpandable().Where(m => m.Id < 10).Select(m => m.Id);
    db.MyModels.Where(m => myIds.Any(id => m.Id == id)).ToList();

    has the same issue. So I withdraw my suggestion of adding specific handling for Contains.

  2. The problem is actually worse if we return entities instead of ints. With ints, the worst that happens is that we get multiple db hits. But

    var myEntities = db.MyModels.AsExpandable().Where(m => m.Id < 10);
    db.MyModels.Where(m => myEntities.Contains(m)).ToList();

    throws a System.NotSupportedException: "Unable to create a constant value of type 'LinqKitContainsMinimalRepro.MyModel'. Only primitive types or enumeration types are supported in this context."

Generally, EF has rich support for composing IQueryables via Contains, Any, All, First, etc. It'd be great if there was an intuitive way to do the same w/ ExpandableQueries.

adamhaile commented 7 years ago

I would expect a different approach (not return integers, these have no meaning, but real entities)

If you're curious, the interface returns ints because not all rights implementations are EF based. Some work against a cached in-memory rights table, some against other systems, etc. For those that are EF, though, EF's ability to inline the .Contains() call has been a very useful performance gain.

MatejQ commented 6 years ago

I found this issue after a few hours of googling (and searching where the error comes from). Essentially when I do:

var myEntities = db.MyModels.AsExpandable().Where(m => m.Id < 10);
db.MyModels.Where(m => myEntities.Any(e => e.Id == m.Id)).ToList();

The second statement fails in runtime with exception System.NotSupportedException: "Unable to create a constant value of type 'LinqKitContainsMinimalRepro.MyModel'. Only primitive types or enumeration types are supported in this context." It's not possible to rewrite the first query to not use AsExpandable. I would really like to see this issue fixed.

MatejQ commented 5 years ago

Anybody alive? Is this going to be fixed or not?

TimHambourger commented 5 years ago

@MatejQ Yes, you've hit this same issue alright. I haven't heard more from any maintainers about a fix.

Looking at this again, I still see the same two possible avenues:

  1. Reach out to the EF team to ask about the "special subset of IQueryables" that I mention above. Is there any possibility of hooking into that process to add support for ExpandableQuery<T>?
  2. Short of that, modify ExpressionExpander.VisitMethodCall to add support for Contains, Any, All, etc. One shortcoming of this approach is that there are a lot of methods to cover. But this should make it possible to do:
    var myEntities = db.MyModels.AsExpandable().Where(m => m.Id < 10);
    // Need to add AsExpandable() here too to be able to consume an ExpandableQuery via .Any()
    db.MyModels.AsExpandable().Where(m => myEntities.Any(e => e.Id == m.Id)).ToList();

Hope that helps!

StefH commented 5 years ago

About option 2:

It's possible to determine if the method call is a special method call, the same is also done here:

System.Linq.Dynamic.Core/Parser/ExpressionParser.cs

With the help from this class: https://github.com/StefH/System.Linq.Dynamic.Core/blob/master/src/System.Linq.Dynamic.Core/Parser/SupportedMethods/IQueryableSignatures.cs

However, I don't see yet how this could be used in the ExpressionExpander.VisitMethodCall method.

StefH commented 5 years ago

@TimHambourger An out of the box question:

Why not use the the AsExpandable() only on the final last statement?

var expandableIds1 = db.MyModels.Where(m => m.Id < 4).Select(m => m.Id);
var result3a = db.MyModels.Where(m => expandableIds1.Contains(m.Id)).AsExpandable().ToList();
Console.Write(JsonConvert.SerializeObject(result3a, Formatting.Indented));
MatejQ commented 5 years ago

I tried AsExpandable before call to Where, as @TimHambourger said, but no difference, still getting the same error. Then I tried AsExpandable as last, before ToList, as @StefH said, no difference either. I'm stumped. Looks like this won't budge at all until it's fixed in LinqKit.

a-stankevich commented 2 years ago

We just faced a similar problem in our application, but the impact was much worse. As in the first post in this thread, the first query was implicitly executed, returned a set of ints, and then it was passed to the second query. The code passed all stages of testing and when deployed to production it finally crashed the whole web application.

The reason for that was the first query returning thousands of rows for one of the tenants, so when it came to compiling the second query that caused StackOverflowException.

I really love this project and the benefits it provides are hard to overestimate. I would love to contribute the fix here, but the code looks too complicated for me. If anyone has a good idea how to fix that or point me in the right direction, I'll be happy to contribute PR.

sdanyliv commented 2 years ago

@a-stankevich, post reproducible sample and I’ll fix that.

a-stankevich commented 2 years ago

@sdanyliv I've created a repo where the issue is reproduced https://github.com/a-stankevich/LinqKit_Issue59/blob/main/LinqKit_Issue59/Program.cs. Thank you for help!

sdanyliv commented 2 years ago

@a-stankevich, well, Ii is not a bug. You have used AsExpandable() in Expression Lambda body. In this case LINQKit expander cannot be activated.

You have to use AsExpandable on top fo the query or as last operator before materialisation. The following statement will fail:

var result = context.Some
   .Select(x => new 
   {
        // we are in expression body
        Other = context.Other.AsExpandable().Where(x => ...) // this expandable is not processed and EF will fail in translation
   })
   .ToList();

This not:

var result = context.Some
   .AsExpandable() // top of the query
   .Select(x => new 
   {
        Other = context.Other.AsExpandable().Where(x => ...) // working now but AsExpandable is not needed
   })
   .ToList();

This also will work:

var result = context.Some
   .Select(x => new 
   {
        Other = context.Other.AsExpandable().Where(x => ...)
   })
   .AsExpandable() // before materialisation 
   .ToList();

LINQKit replaces IQueryProvider and makes tunnel between original EF QueryProvider and catch expressions before passing to EF. In Expression body it is not possible.

Also do not forget about SelectMany:

var query = 
    from s in context.Some
    from o in context.Other.AsExpandable() // in this case LINQKit also will not work.
    ...

In Query Syntax, looks like you are calling AsExpandable not in expression body, but it is not. Compiler rewrites this query to the following:

var query = context.Some
    .SelectMany(s => context.Other.AsExpandable())

And again AsExpandable is in expression body and expanding will fail.

a-stankevich commented 2 years ago

Thank you for detailed explanation @sdanyliv. I understand the situation we discuss is caused by incorrect usage of AsExpandable(). You say that such statement will fail, and actually it would be great if they did. Unfortunately, they are actually executed and no failure is observed. This causes crashes of entire process under specific conditions as I described in the comment above.

Sometimes such this misuse of AsExpandable is not obious from the code when query is composed of expressions spread across different methods or even classes.

I understand it may be difficult or impossible to make this run as one query. Then is there a way to have an exception thrown for these scenarios so that developers would be immediately aware of the issue?

Or maybe it's possible to override EF IQueryProvider with LINQKit's one once for entire DbContext?

My intention is to have this somehow removed from 'things to keep in mind when using LINQKit' list for developers

a-stankevich commented 2 years ago

Replying to my own request, it turns out that using async functions for materialization (e.g. ToListAsync) throws exception at the time when the nested function is called. So this approach can be used to protect for StackOverflowException in these boundary cases.