nhibernate / nhibernate-core

NHibernate Object Relational Mapper
https://nhibernate.info
GNU Lesser General Public License v2.1
2.12k stars 924 forks source link

NH-3204 - Fetch breaking proper paging #1281

Open nhibernate-bot opened 6 years ago

nhibernate-bot commented 6 years ago

rosieks created an issue:

When I run query like this:

session.Query().Fetch(c => c.Addresses).Take(10);

or

session.Query().Take(10).Fetch(c => c.Addresses);

there is generated wrong SQL, because paged is query with join - this is wrong.


Alexander Zaytsev added a comment — : I believe that this might be not an issue, because you should not mix paging and fetching.


Oskar Berggren added a comment — : Fetching implies a join, which makes paging work differently. Use a subquery to for paging and fetching in the other query.


rosieks added a comment — : I would like to use subquery. Unfortunately it doesn't work for me in LINQ, so there is no way to avoid N+1. I don't understand why this behavior is accepted. I want take 10 customers, not 10 records from DB. LINQ is not SQL. It is possible to write correct query. You should make paging before join. In Entity Framework Include method work properly.


Michael Teper added a comment — : I am with Slawomir -- while I appreciate the underlying mechanics, the behavior is unexpected and I'd argue broken.


Oskar Berggren added a comment — : I'll reopen for now, in case anyone wants to work on this. Supposedly an improvement in this area would feature an automatic rewrite to use a subquery. Is this what EF does?

Slawomir, why can't you use a subquery? And remember you also have the batch-size in mappings to play with to avoid N+1.


rosieks added a comment — : EF generate the following query:


SELECT 
<Project1>.[Id] AS [Id], 
[Project1].[C1] AS [C1], 
[Project1].[Id1] AS [Id1], 
[Project1].[Customer*Id] AS [Customer*Id]
FROM ( SELECT 
  <Limit1>.[Id] AS [Id], 
  [Extent2].[Id] AS [Id1], 
  [Extent2].[Customer*Id] AS [Customer*Id], 
  CASE WHEN (<Extent2>.[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS <C1>
  FROM   (SELECT TOP (10) <c>.[Id] AS [Id]
      FROM [dbo].[Customers] AS [c] ) AS [Limit1]
  LEFT OUTER JOIN [dbo].[Addresses] AS [Extent2] ON [Limit1].[Id] = [Extent2].[Customer_Id]
)  AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC

Oskar - sorry but I mean can't use a fetch="subselect", I don't try subquery


Alexander Zaytsev added a comment — : NH-3456 has a test case


Lee Timmins added a comment — : Issue 3456 includes an attached test case which may help.


Dawid Ciecierski added a comment — : Definitely agree that while the underlying mechanics explain themselves, for a regular user the behaviour may and should seem broken and should therefore be fixed.

The status is listed as 'Confirmed', but there seems to have been little activity over past months. Is anyone working on a solution? If not, would anyone care to outline a solution that would be acceptable by the core team so that we can start working on a patch? Should we automatically rework the query or perhaps automatically issue another one..?

EDIT: Sławomir, any chance you could add skip to the mix and show us what EF would generate for your particular sample case?


Lee Timmins added a comment — : <~chrazz> I have just done a simple example to show you the query generated by Entity Framework. Imagine a blog with the following entities:


public class Post {
    public int PostId { get; set; }
    public string Name { get; set; }

    public virtual IList<Comment> Comments { get; private set; }

    public Post() {
        Comments = new List<Comment>();
    }
}

public class Comment {
    public int CommentId { get; set; }
    public virtual Post Post { get; set; }
}

If I said the following in entity framework:


_context.Posts
    .Include("Comments")
    .Take(3)
    .ToList();

It would generate the following query:


SELECT 
<Project1>.[PostId] AS [PostId], 
[Project1].[Name] AS [Name], 
[Project1].[C1] AS [C1], 
[Project1].[CommentId] AS [CommentId], 
[Project1].[Post*PostId] AS [Post*PostId]
FROM ( SELECT 
  <Limit1>.[PostId] AS [PostId], 
  [Limit1].[Name] AS [Name], 
  [Extent2].[CommentId] AS [CommentId], 
  [Extent2].[Post*PostId] AS [Post*PostId], 
  CASE WHEN (<Extent2>.[CommentId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS <C1>
  FROM   (SELECT TOP (3) <c>.[PostId] AS [PostId], [c].[Name] AS [Name]
      FROM [dbo].[Posts] AS [c] ) AS [Limit1]
  LEFT OUTER JOIN [dbo].[Comments] AS [Extent2] ON [Limit1].[PostId] = [Extent2].[Post_PostId]
)  AS [Project1]
ORDER BY [Project1].[PostId] ASC, [Project1].[C1] ASC

Dawid Ciecierski added a comment — : Thank you but this looks similar to the example posted above. I wanted to see if Skip changes the query much so that we could get a better picture in preparing a fix.


Ricardo Peres added a comment — : Workaround:

session.Query

().Where(x => session.Query().Select(y => y.CustomerId).Take(10).Skip(5).Contains(x.Customer.CustomerId)).Select(x => x.Customer).ToList();


Lee Timmins added a comment — : Oh right, Here's a couple examples. First the following:


_context.Posts
    .Include("Comments")
    .OrderBy(p => p.PostId)
    .Skip(2)
    .Take(3)
    .ToList();

Produces the following query:


SELECT 
<Project1>.[PostId] AS [PostId], 
[Project1].[Name] AS [Name], 
[Project1].[C1] AS [C1], 
[Project1].[CommentId] AS [CommentId], 
[Project1].[Post*PostId] AS [Post*PostId]
FROM ( SELECT 
  <Limit1>.[PostId] AS [PostId], 
  [Limit1].[Name] AS [Name], 
  [Extent2].[CommentId] AS [CommentId], 
  [Extent2].[Post*PostId] AS [Post*PostId], 
  CASE WHEN (<Extent2>.[CommentId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS <C1>
  FROM   (SELECT TOP (3) <Extent1>.[PostId] AS [PostId], [Extent1].[Name] AS [Name]
      FROM ( SELECT <Extent1>.[PostId] AS [PostId], [Extent1].[Name] AS [Name], row*number() OVER (ORDER BY <Extent1>.[PostId] ASC) AS [row*number]
          FROM [dbo].[Posts] AS [Extent1]
      )  AS [Extent1]
      WHERE [Extent1].[row_number] > 2
      ORDER BY [Extent1].[PostId] ASC ) AS [Limit1]
  LEFT OUTER JOIN [dbo].[Comments] AS [Extent2] ON [Limit1].[PostId] = [Extent2].[Post_PostId]
)  AS [Project1]
ORDER BY [Project1].[PostId] ASC, [Project1].[C1] ASC

And the following:


_context.Posts
    .Include("Comments")
    .OrderBy(p => p.PostId)
    .Take(3)
    .Skip(2)
    .ToList();

Produces the following query:


SELECT 
<Project1>.[PostId] AS [PostId], 
[Project1].[Name] AS [Name], 
[Project1].[C1] AS [C1], 
[Project1].[CommentId] AS [CommentId], 
[Project1].[Post*PostId] AS [Post*PostId]
FROM ( SELECT 
  <Skip1>.[PostId] AS [PostId], 
  [Skip1].[Name] AS [Name], 
  [Extent2].[CommentId] AS [CommentId], 
  [Extent2].[Post*PostId] AS [Post*PostId], 
  CASE WHEN (<Extent2>.[CommentId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS <C1>
  FROM   (SELECT <Limit1>.[PostId] AS [PostId], [Limit1].[Name] AS [Name]
      FROM ( SELECT <Limit1>.[PostId] AS [PostId], [Limit1].[Name] AS [Name], row*number() OVER (ORDER BY <Limit1>.[PostId] ASC) AS [row*number]
          FROM ( SELECT TOP (3) <Extent1>.[PostId] AS [PostId], [Extent1].[Name] AS [Name]
              FROM [dbo].[Posts] AS [Extent1]
              ORDER BY [Extent1].[PostId] ASC
          )  AS [Limit1]
      )  AS [Limit1]
      WHERE [Limit1].[row_number] > 2 ) AS [Skip1]
  LEFT OUTER JOIN [dbo].[Comments] AS [Extent2] ON [Skip1].[PostId] = [Extent2].[Post_PostId]
)  AS [Project1]
ORDER BY [Project1].[PostId] ASC, [Project1].[C1] ASC

Please note that Entity Framework threw an error if an OrderBy expression did not exist before Skip.

Hope this helps.


Dawid Ciecierski added a comment — : Thank you. Skip and Take with no OrderBy exception is the usual behaviour as NHibernate, EF or SQL (ROW_NUMBER) are not able to ascertain what exactly you might want to have your results numbered by.

ahsteele commented 5 years ago

Any work being done on this bug?

oskarb commented 5 years ago

Paging have always worked differently in NHibernate when fetching was also used. The behaviour displayed by LINQ mirrors that of the underlying HQL.

It would be useful of course if the fetching part was "ignored" for the purposes of paging - or possibly even ignored completely if paging is also used. The latter might have implications for performance, but would perhaps make the LINQ query return a more naturally expected result. Would be a breaking change because code that closes the session before accessing the feched elements would no longer work.

Given that this is not something NHibernate has ever done it could be argued that this is not a bug but a feature request.

fredericDelaporte commented 5 years ago

Yes, NHibernate does not support proper paging with fetching of collections. From a Linq standpoint, being used to what does EF causes this to look as a bug. But the reference for Linq behavior is more what does Linq-to-objects, and in Linq-to-objects, fetching simply does not even exist, there are no equivalents for it (nor any need for it indeed).

Anyway, irrespective of whether it should be called a bug or a feature request, the point of interest for you is that, as far as I know, no one works on this subject. NHibernate development is volunteers based, so it can be you who start working on it.

As already mentioned in this issue, there are a number of alternate ways, which is maybe the reason for not having anyone currently sufficiently interested in this for working on it:

  1. Use lazy loading with batching enabled instead of fetching. This turns the n+1 loading into a 1+1 one (or more precisely, into a Ceiling(n / batch-size) +1 one).
  2. Use pagination on a query without collection fetches for retrieving the entities ids, then fetch collections with an in (previously queried entities ids). (So for LINQ, a .Where(e => fetchedIds.Contains(e.Id)).). This results in 2 queries instead of n + 1.
  3. About using a subquery, which would mean combining the previous two queries into one, it also works. There are two ways of writing it:

    • .Where(e => s.Query<Entity>().Select(e2 => e2.Id).Skip(10).Take(5).Contains(e.Id)). This generates an in clause with an uncorrelated subquery.
    • .Where(c => s.Query<Entity>().Skip(10).Take(5).Any(e2 => e2.Id == e.Id)). This generates an exists clause with a correlated subquery. To be avoided in my opinion, unless the first way is not supported by the database or the entity (composite id cases).

    But if you have many fetches to do and want to avoid Cartesian products, it is better to use 2. and split the fetches in future queries.

This said, a more straightforward reason for having no one working on this subject is also that I think it will be quite hard to do that change. Currently paging is not handled at the HQL level, but is done by tweaking the SQL generated without paging, for injecting paging in it. And this is implemented in each dialect supported by NHibernate. Implementing paging on the root entity instead of the whole query would require a quite heavy rewrite of the SQL. Doing this on the full SQL generated without paging seems quite unpractical.

maca88 commented 5 years ago

it could be argued that this is not a bug but a feature request.

For me this is a feature request as Fetch is designed to be a "lower level" method which just adds a left join to include the desired relation.

Any work being done on this bug?

As already explained in NHibernate nothing was done on this matter but as I also needed this feature I've made an extension library NHibernate.Extensions quite a long ago that I still use and maintain. The library provides an Include method that works similar to the EF Include method. The difference is in how the query is made, in EF UNION ALL is used where with this extensions one or multiple queries with a left join for each relation will be created that uses a subquery where the paging is applied. In short the Include method uses the Fetch method, future query and a subquery when paging is required. About performance, in my experience the UNION ALL works faster up to ~5 relations (depends on the number of columns) if you have more than that the approach that this library use becomes faster.

Also from my experience the Include method is faster that the lazy loading approach when you know in advance which relations you need to have (example: validating a very complex entity graph) and also easier to use/configure. But at the end mixing both is the best approach in my opinion where using lazy load/batching on relations that point to an entity that very frequently referenced and its data does not change often (e.g. entity that represent an user) and Include for other relations.

fredericDelaporte commented 5 years ago

I have checked the subquery solution, it indeed works, and I have updated my comment above. (I was wrongly thinking the paging was injected on the generated SQL after having already combined queries and sub-queries, but it is in fact injected in the subquery SQL before using it for generating the whole SQL query.)

fredericDelaporte commented 5 years ago

@maca88 extension is quite another approach than globally changing the current paging implementation, and it handles only LINQ, not the other APIs like HQL or Criteria/QueryOver.

But it is nonetheless an interesting solution, especially since it also avoids the Cartesian product issue incurred by multiple collection fetches.

It could be viewed as a kind of LINQ query automatic rewriting. But as it may also split them into many queries, still executed in one round-trip to DB for those supporting multiqueries, it is a bit heavier than rewritings already existing in NHibernate.

Maybe should the NHibernate LINQ provider be changed for working like this extension.

It may be a breaking change for databases with insufficient subquery support, but I do not think there are many applications using a NHibernate paginated query with collection fetches, due to the rather unexpected results it currently yields.

Unrelated note about the .Take(3).Skip(2) example of Lee Timmins: it is unsupported by NHibernate LINQ provider. It is handled like Skip(2).Take(3), which is not the same thing. If willing this to be fixed, I think a dedicated issue should be opened. But what is the point of writing .Take(3).Skip(2)? Better rewrite it .Skip(2).Take(1).