Open ctrlaltdan opened 1 year ago
Hello,
It is a very good suggestion indeed, even if the implementation you bring here has a flaw (in case of several operators working against EF core, InvokeInDedicatedThreadAsync
ensures no call is done in parallel since it is not supported).
I will work on a amendment on this regard. Thanks a lot for this! :smiley:
Great to hear!
I eagerly removed the dbcontext instance from the thread lock as I assumed the node instance was threadsafe, and the dbcontext instance was local to the batch anyway.
My thinking that parallel queries are supported, just not within a single dbcontext. Similar to how the .net request pipeline doesn't enforce singleton access to a dbcontext.
Thanks for picking this up. I'm enjoying learning about your framework.
Operators are meant to work in parallel, (that's the purpose). But EF core raises an exception when a query is already being run on the same DbContext
in a concurrent thread (this is for this very purpose I had to develop the method InvokeInDedicatedThreadAsync
).
Just as an extra information, in the current implementation, the fact the dbContext
is not disposed it is absolutely normal. It is not up to an ETL operator to decide that it must be trashed as it has been created in a much higher level context.
Of course, if you issue a different DbContext
at each use, everything that is mentioned above is irrelevant, but in most of situations a DbContext
is directly given to the ProcessRunner
. So what you suggest will be another way to work: if a IDbContextFactory
it will be used instead.
About proper release of entities, I will make a quick win: execute a dbContext.ChangeTracker.Clear()
at every batch unless the option KeepChangeTracker
is activated. This will permit entities to be released from memory if they are not used anywhere else.
a new version 2.1.6-beta is available. I didn't test it too well to speak franckly. You can give it a try. I should work with IDbContextFactory out of the box. If you want to specify the type of the DbContext, you must call .WithContextType<your-dbContext-type>()
on the options builder.
Let me know if it suits you.
Hi all,
I've been using the
EfCoreSave
operator withSaveMode.EntityFrameworkCore
option enabled as I'm planning to write my data to Postgres.I've noticed when dealing with large streams, this operator creates a bottleneck for two reasons:
I've changed a local copy to look as follows:
MS Docs on DbContextFactory
This allows multiple streams to write to the database at the same time, and crucially, disposes the DbContext after each operation (considerably improving performance/memory overhead in the change tracker).
On my sample of 500k+ entries the import is now silky smooth.
Was there a deliberate design decision behind this?