zzzprojects / GraphDiff

GraphDiff is a library that allows the automatic update of a detached graph using Entity Framework code first.
https://entityframework-graphdiff.net/overview
MIT License
333 stars 101 forks source link

Fix: Prevent additional query when updating new graphs (faster inserts) #146

Open mk1024 opened 9 years ago

mk1024 commented 9 years ago

I'm using GraphDiff in a few projects that perform a lot of batch updates and inserts. Here are my optimizations that made it about 15-20x faster in my use cases. All tests in GraphDiff.Tests and in my own projects run fine. Perhaps Brent or Andreas can have a look at my changes.

Inserting new entities

When updating new graphs (insert) GraphDiff tries to load the persisted graph, which always returns null and requires a lot of unneccessary work (predicate expression, include strings, actual query to db). If an entity is new and not persisted yet, you should not try to load it from the database.

This optimization is optional for simple int and long primary keys (GraphDiffConfiguration) and is automatically ignored in other scenarios (string/guid/composite/etc).

Updating of collections

When updating many entities of the same type you have to call UpdateGraph in a loop. This results in a lot of queries from QueryLoader and extremely slows down large batch updates. As proposed by DixonD-git (Issue https://github.com/refactorthis/GraphDiff/issues/127), loading of many entities can be done in a single query to increase performance.

The performance boost was quite a surprise (15-20x on my local machine). I only had to change internal interfaces and classes, while the public interface (DbContextExtensions) did not change. Composite keys create predicates like (KeyA='a1' AND KeyB='b1') OR (KeyA='a2' AND KeyB='b2') OR ... and single int keys are translated to ...Key in (1,2,3,4,5...).

kabua commented 8 years ago

@mk1024 - Thanks! We have incorporated your changes into our private code base.

cryo75 commented 8 years ago

After this fix, on inserting of new entities, the associated entities are still being loaded again before saving the graph.

The code where the loading happens is located here: AssociatedEntityGraphNode.Update()

Does this fix actually work?

mk1024 commented 8 years ago

@cryo75 - If your entities have int/long primary keys and if you set GraphDiffConfiguration.SkipLoadingOfNewEntities = true the loading of the new entity itself is skipped. Depending on your model it can still be neccessary to load associated entites to be updated.

cryo75 commented 8 years ago

I have the following: ` public class AddressDto() { public int Id {get; set;} public PostCodeDto {get; set;} }

public class AddressModel() 
{
    public int Id {get; set;}
    public PostCodeModel {get; set;}
    public PostCodeModelId {get; set;}
}

`

In my particular scenario, the associated entities in the Dto classes are the actual entities themselves. I do no expose the foreign keys.

So, in the above, when inserting a new AddressModel, the model will only have a value for PostCodeModel. On saving I do:

return ((DbContext)dataContext).UpdateGraph<AddressModel> (entity, map => map.AssociatedEntity(x => x.PostCodeModel));

But GraphDiff loads all the associated entites before saving, which, in my case, is unnecessary. Is there a way to trim down the loading of associated models before inserts/updates? (However, note that the returned model must have the actual foreign keys set after saving).

mk1024 commented 8 years ago

@cryo75 - I think that not exposing the fk property (PostCodeModelId) is causing this issue. If PostCodeModel should not be updated/inserted with AddressModel anyway, you could just set PostCodeModelId and leave PostCodeModel empty when inserting. Depending on your model and configuration, this should do the trick.

cryo75 commented 8 years ago

Preferably I would need a mixture so that GraphDiff:

  1. sets the foreign id's automatically from the associated models.
  2. doesn't reload the associated models before inserting/updating
  3. return the complete inserted/updated entity with the associated models and foreign id's all set.

As for OwnedCollections, the default will still apply. Is this possible?