zzzprojects / EntityFramework-Extensions

Entity Framework Bulk Operations | Improve Entity Framework performance with Bulk SaveChanges, Insert, update, delete and merge for SQL Server, SQL Azure, SQL Compact, MySQL and SQLite.
https://entityframework-extensions.net
344 stars 57 forks source link

Support bulk insert into intermediary memory optimized temporary table #136

Closed jzabroski closed 5 years ago

jzabroski commented 6 years ago

CC @mmiller678

Currently, you create a session temp table #ZZZ_projects_GUID

This still has to hit disk, creating WRITELOG and PAGEIOLATCH_EX overhead as well.

The MERGE statement can use a memory optimized table in the source, but not the target.

See Microsoft's example here: Faster temp table and table variable by using memory optimization

@JonathanMagnan this is an attempt to troubleshoot the high INSERT BULK time we see. I believe this could be a generally great feature, though.

JonathanMagnan commented 6 years ago

Hello @jzabroski ,

We will look at it tomorrow about having an option and check the limitations for using memory optimized table.

However, in your case, I believe we need to highly look why the tempdb is taken so much time. Something is not normal from statistics I received that happen in this database.

Best Regards,

Jonathan

jzabroski commented 6 years ago

You're probably right that this won't solve our problems. It's just a workaround that might sidestep our issues and add value to your product.

I think this trick could give some customers explosive performance gains, which ultimately means more happy paying customers.

JonathanMagnan commented 6 years ago

Hello @jzabroski ,

We made some investigation about memory-optimized table.

Unfortunately, it looks they are better/faster for SELECT than for inserting with SqlBulkCopy

The table creates/drop take over 200ms each alone which is already bad but we found out that SqlBulkCopy was taking more than 2x more time in our environment.

It can still be a good idea for people having a very slow disk but I'm not sure that's a very good long-term solution.

Best Regards,

Jonathan

jzabroski commented 6 years ago

The table creates/drop take over 200ms each alone

What do you mean, create/drop? Ideally, the create table statement is done once, outside the app, as a "Staging" table. I.e. I can imagine an API call as adding an overload string memoryOptimizedTableName, string memoryOptimizedSessionColumnName.

The SELECT being fast is exactly why I suspect BulkMerge especially will be faster.

Can you share the code with me so I can investigate further? I can instrument it with SQL Server extended events too which could be awesome for more fine grain benchmarks

On Fri, Aug 3, 2018, 4:53 PM Jonathan Magnan notifications@github.com wrote:

Hello @jzabroski https://github.com/jzabroski ,

We made some investigation about memory-optimized table.

Unfortunately, it looks they are better/faster for SELECT than for inserting with SqlBulkCopy

The table creates/drop take over 200ms each alone which is already bad but we found out that SqlBulkCopy was taking more than 2x more time in our environment.

It can still be a good idea for people having a very slow disk but I'm not sure that's a very good long-term solution.

Best Regards,

Jonathan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/EntityFramework-Extensions/issues/136#issuecomment-410373394, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbT_aw-7gryn1PIpgr11OUyJUzLwdpkks5uNLgugaJpZM4VtDEI .

JonathanMagnan commented 6 years ago

Hello @jzabroski ,

Great, we will add an option probably tomorrow to support this scenario. So you will be able to test the performance with a beta version of this feature.

JonathanMagnan commented 6 years ago

What do you mean, create/drop

I thought we needed to manage it but if you say that's something you create on your side before using our library, then we can skip this time.

jzabroski commented 6 years ago

@JonathanMagnan

If you look at the example code on MSDN, they show step by step how to create a "memory optimized temp table". The documentation is obviously confusing, because it's not REALLY a temp table in the sense of a table in tempdb, but in practice can provide the facade of a temp table. Please see the MSDN article example C. Scenario: Replace session tempdb #table. Note this example even states (emphasis mine):

Replace the CREATE TABLE #tempSessionC statements in your code with DELETE FROM dbo.soSessionC, to ensure a session is not exposed to table contents inserted by a previous session with the same session_id. It is important to create the memory-optimized table at deployment time, not at runtime, to avoid the compilation overhead that comes with table creation.

So, in your code, the internal API would have a try-finally, the finally block would have to delete the session data:

DELETE FROM dbo.soSessionC

The end-user would also probably want to truncate the table occasionally to ensure rude-aborts don't cause "garbage" to accumulate in the staging table. Obviously, while this should produce blazing fast MERGE statements, it should be used with some care. I do believe this would allow us to shatter ETL benchmarks, though, and I'm curious to see the results!

jzabroski commented 6 years ago

To be clear, the pseudo-code should be:

var deleteSessionData = "DELETE FROM {memoryOptimizedTableName}";
try {
   Call SqlCommand with text deleteSessionData
   Bulk Insert data into memoryOptimizedTableName
   Do Merge using memoryOptimizedTableName as source
}
finally {
   Call SqlCommand with text deleteSessionData
}

Hope this is clearer.

JonathanMagnan commented 6 years ago

Thank for the additional information @jzabroski ,

Everything is very clear ;)

JonathanMagnan commented 6 years ago

Hello @jzabroski ,

Just to let you know it will take a few more days before releasing it.

The code is already completed and working. However, we found out Memory Table is not very transaction friendly, so we need to improve our code for it.

Best Regards,

Jonathan

jzabroski commented 6 years ago

Interesting. @mmiller678 is on vacation so a few days won't bother us.

However, we found out Memory Table is not very transaction friendly, so we need to improve our code for it.

Please elaborate so that I can learn along with you how "Memory Table is not very transaction friendly". We have SqlBulkCopy logic that doesn't depend on EFCore, so it might help us a great deal to know what you learned.

After your comment, I googled and found on MSDN Docs > SQL > Relational databases > In-memory OLTP > Transactions with Memory-Optimized Tables > Transaction phases and lifetime, but nothing stands out to me as cause for issues?

JonathanMagnan commented 6 years ago

Hello @jzabroski ,

If you want to try it directly with our Bulk Operations library, you can do it starting with the v2.13.26: https://www.nuget.org/packages/Z.BulkOperations/

Please elaborate so that I can learn along with you how "Memory Table is not very transaction friendly

When using a transaction, we have the following error: Accessing memory optimized tables using the READ COMMITTED isolation level is supported only for autocommit transactions. It is not supported for explicit or implicit transactions. Provide a supported isolation level for the memory optimized table using a table hint, such as WITH (SNAPSHOT).

We have tried several isolation level but it seem they have all their limitations. We will try with table hint probably tomorrow and see what's hapenning.

Here is an example:

using System.Collections.Generic;
using System.Data.SqlClient;
using System.Diagnostics;
using System.Windows.Forms;
using Z.BulkOperations;

namespace Lab.BulkOperations.v451
{
    /*
CREATE TABLE[dbo].[TestMemoryTable](
[ID]
[int] IDENTITY(1,1) NOT NULL,

[ColumnInt] [int]
NULL,
[ColumnString]
[varchar](50) NULL
) ON[PRIMARY]
GO
CREATE TABLE [dbo].[Memory_TestMemoryTable]
(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [ColumnInt] [int] NULL,
    [ColumnString] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AI NULL,
    [ZZZ_Index] [int] NULL,

 CONSTRAINT [PK_Memory_TestMemoryTable]  PRIMARY KEY NONCLUSTERED 
(
    [ID] ASC
)
)WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_ONLY )
     */
    public partial class Form_Request_MemoryTable : Form
    {
        public Form_Request_MemoryTable()
        {
            InitializeComponent();

            var list = new List<TestMemoryTable>();

            for (var i = 0; i < 100000; i++)
            {
                list.Add(new TestMemoryTable {ColumnInt = i, ColumnString = "Z_" + i});
            }

            var connectionString = "Server=localhost;Initial Catalog=Z.BulkOperations.Lab;Integrated Security=true;";
            var clock = new Stopwatch();
            using (var conn = new SqlConnection(connectionString))
            {
                conn.Open();

                //var trans = conn.BeginTransaction();

                using (var bulk = new BulkOperation<TestMemoryTable>(conn))
                {
                    //bulk.Transaction = trans;
                    bulk.TemporaryTableName = "Memory_TestMemoryTable";
                    bulk.TemporaryTableIsMemory = true; // Required since some options such as LOCK TABLES is not supported for SqlBulkCopy
                    bulk.TemporaryTablePersist = true; // Optional if you don't want to drop data (using schema only will do it automatically)
                    bulk.DestinationTableName = "TestMemoryTable";

                    clock.Start();
                    bulk.BulkMerge(list);
                    clock.Stop();
                }

                //trans.Commit();
            }
        }

        public class TestMemoryTable
        {
            public int ID { get; set; }
            public int ColumnInt { get; set; }
            public string ColumnString { get; set; }
        }
    }
}