zzzprojects / EntityFramework-Extensions

Entity Framework Bulk Operations | Improve Entity Framework performance with Bulk SaveChanges, Insert, update, delete and merge for SQL Server, SQL Azure, SQL Compact, MySQL and SQLite.
https://entityframework-extensions.net
343 stars 57 forks source link

BulkInsert - ERROR_34 #582

Open mhsimkin opened 4 months ago

mhsimkin commented 4 months ago

Description

Received the following error message during BulkInsert

ERROR_34: Oops! the error occured in the bulk operations. Report this error to our support team: info@zzzprojects.com , v=8.102.2.4

Exception

System.Exception: ERROR_34: Oops! the error occured in the bulk operations. Report this error to our support team: info@zzzprojects.com , v=8.102.2.4
   at ?.?(? ?)
   at ?.Execute()
   at ?.?()
   at ?.Create()
   at ?.Create(DbContext context, EntityTypeZInfo entityTypeInfo, IEqualityComparer`1 dictionaryEqualityComparer, List`1 list, BulkOperationActionType actionType, Boolean isIncludGraph, Boolean isIncludGraphBuilderBulkSaveChanges, Boolean isSkipInsertForInternalFeatures)
   at ?.BulkInsert[T](DbContext this, IEnumerable`1 entities, Action`1 options, Boolean isBulkSaveChanges, Boolean isOptimized)
   at DbContextExtensions.BulkInsert[T](DbContext this, IEnumerable`1 entities, Action`1 options)
   at MongoToSql.Program.<>c__DisplayClass6_0.<MoveRecommendations>b__1(IEnumerable`1 items) in D:\Work\BN\ProductDataSystemsGitRepos\Tools\RecFrontEnd\MongoToSql\Program.cs:line 138
   at Pds.Mongo.Repository`1.<>c__DisplayClass13_0.<<GetItemsAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__1_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Polly.Outcome`1.GetResultOrRethrow()
   at Polly.ResiliencePipeline.ExecuteAsync(Func`2 callback, ResilienceContext context)
   at Pds.Mongo.Repository`1.GetItemsAsync(FilterDefinition`1 query, FindOptions`2 options, Action`1 loopActionFilter, Dictionary`2 additionalItems, String caller, String callerFilePath, Int32 lineNumber)
   at MongoToSql.Program.MoveRecommendations() in D:\Work\BN\ProductDataSystemsGitRepos\Tools\RecFrontEnd\MongoToSql\Program.cs:line 114

Fiddle or Project (Optional)

Hi:

I am getting the below error when I execute this code:

using var dbContext1 = new RecommendationContext(null, _databaseLocation, new List<string> { "Recommendations" }, _providerType);

dbContext1.BulkInsert(newRecs, options =>
{

    options.AutoMapOutputDirection = false;
    options.InsertIfNotExists = true;
    options.Log = Console.WriteLine;
});

The class is defined to EF as:

public class ActiveRecRecordTypeConfiguration : IEntityTypeConfiguration<ActiveRecRecord>
{
    /// <summary> Configures the entity of type <see cref="ActiveRecRecord" />. </summary>
    /// <param name="builder"> The builder to be used to configure the entity type. </param>
    /// <seealso cref="M:IEntityTypeConfiguration.Configure(EntityTypeBuilder{ActiveRecRecord})" />
    public void Configure(EntityTypeBuilder<ActiveRecRecord> builder)
    {
        builder.HasKey(p => new { p.Ean, p.AlgoCd }).HasName("PK_AR_Ean_AlgoCd");

        builder.Property(p => p.Ean).IsRequired();
        builder.Property(p => p.AlgoCd).IsRequired().HasMaxLength(10);
        builder.Property(p => p.Updated).IsRequired();
        builder.Property(p => p.CacheInsertTime).IsRequired();
        builder.Property(p => p.Deleted).IsRequired().HasDefaultValue(0);

        builder.OwnsOne(
            activeRec => activeRec.RecommendationData, onb =>
            {
                onb.ToJson();
                onb.OwnsMany(activeRec => activeRec.Recs);
            })
            .Navigation(activeRec => activeRec.RecommendationData).IsRequired();

        builder.HasIndex(p => p.CacheInsertTime).HasDatabaseName("IX_AR_CacheInsertTime");
        builder.HasIndex(p => p.Updated).HasDatabaseName("IX_AR_Updated");

        builder.Ignore(p => p.Id);
        builder.Ignore(p => p.IsValidEan);
    }
}

The Recs property of the activeRecs object is List.

I have also attached a log file of all operations that occurred up until the error. Is there a way I can have the temp tables not deleted, Then I can send you the contents of the last temp table, assuming the error has something to do with an object in that table.

Thanks Marc

Further technical details

operation.log

mhsimkin commented 4 months ago

Hi.

I suspect the problem is related to using SQL Express and hitting the DB Size Limit. It would be nice if a more specific error message could be return.

Something like:

{"An error occurred while saving the entity changes. See the inner exception for details."}
    Data: {System.Collections.ListDictionaryInternal}
    Entries: Count = 678
    HResult: -2146233088
    HasBeenThrown: true
    HelpLink: null
    InnerException: {"Could not allocate space for object 'dbo.Recommendations'.'PK_AR_Ean_AlgoCd' in database 'Recommendations' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup."}
    Message: "An error occurred while saving the entity changes. See the inner exception for details."
    SerializationStackTraceString: "   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.ConsumeResultSet(Int32 startCommandIndex, RelationalDataReader reader)\r\n   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.Consume(RelationalDataReader reader)\r\n   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.Execute(IRelationalConnection connection)\r\n   at Microsoft.EntityFrameworkCore.SqlServer.Update.Internal.SqlServerModificationCommandBatch.Execute(IRelationalConnection connection)\r\n   at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.Execute(IEnumerable`1 commandBatches, IRelationalConnection connection)\r\n   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(IList`1 entriesToSave)\r\n   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(StateManager stateManager, Boolean acceptAllChangesOnSuccess)\r\n   at Microsoft.EntityFrameworkCore.SqlServer.Stora
ge.Internal.SqlServerExecutionStrategy.Execute[TS..."
    SerializationWatsonBuckets: null
    Source: "Microsoft.EntityFrameworkCore.Relational"
    StackTrace: "   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.ConsumeResultSet(Int32 startCommandIndex, RelationalDataReader reader) in /_/src/EFCore.Relational/Update/AffectedCountModificationCommandBatch.cs:line 323\r\n   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.Consume(RelationalDataReader reader) in /_/src/EFCore.Relational/Update/AffectedCountModificationCommandBatch.cs:line 58\r\n   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.Execute(IRelationalConnection connection) in /_/src/EFCore.Relational/Update/ReaderModificationCommandBatch.cs:line 354\r\n   at Microsoft.EntityFrameworkCore.SqlServer.Update.Internal.SqlServerModificationCommandBatch.Execute(IRelationalConnection connection) in /_/src/EFCore.SqlServer/Update/Internal/SqlServerModificationCommandBatch.cs:line 203\r\n   at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.Execute(IEnumerable`1 commandBatches, IRelationalConnection connection
) in /_/src/EFCore.Relational/Update/Internal/BatchExecutor.cs:line 95\r\n   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(IList`1 entriesToSave) in /_/src/EFCore/ChangeTracking/Internal/StateManager.cs:line 1315\r\n   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(StateManager stateManager, Boolean acceptAllChangesOnSuccess) in /_/src/EFCore/ChangeTracking/Internal/StateManager.cs:line 1371\r\n   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded) in /_/src/EFCore.SqlServer/Storage/Internal/SqlServerExecutionStrategy.cs:line 63\r\n   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(Boolean acceptAllChangesOnSuccess) in /_/src/EFCore/ChangeTracking/Internal/StateManager.cs:line 1349\r\n   at Microsoft.EntityFrameworkCore.DbContext.SaveChanges(Boolean acceptAllChangesOnSuccess) in /_/src/EFCore/D
bContext.cs:line 665\r\n   at MongoToSql.Program.<>c__DisplayClass7_0.<MoveRecommendations>b__1(IEnumerable`1 items) in D:\\Work\\BN\\ProductDataSystemsGitRepos\\Tools\\RecFrontEnd\\MongoToSql\\Program.cs:line 153\r\n   at Pds.Mongo.Repository`1.<>c__DisplayClass13_0.<<GetItemsAsync>b__0>d.MoveNext() in Pds.Mongo\\Repository.cs:line 192\r\n   at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__1_0>d.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.Async.cs:line 67\r\n   at Polly.Outcome`1.GetResultOrRethrow() in /_/src/Polly.Core/Outcome.TResult.cs:line 90\r\n   at Polly.ResiliencePipeline.<ExecuteAsync>d__1.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.Async.cs:line 78\r\n   at Pds.Mongo.Repository`1.<GetItemsAsync>d__13.MoveNext() in Pds.Mongo\\Repository.cs:line 189\r\n   at MongoToSql.Program.<MoveRecommendations>d__7.MoveNext() in D:\\Work\\BN\\ProductDataSystemsGitRepos\\Tools\\RecFrontEnd\\MongoToSql\\Program.cs:line 117"
    TargetSite: {Int32 ConsumeResultSet(Int32, Microsoft.EntityFrameworkCore.Storage.RelationalDataReader)}
    _HResult: -2146233088
    _data: {System.Collections.ListDictionaryInternal}
    _dynamicMethods: null
    _entries: Count = 678
    _exceptionMethod: null
    _helpURL: null
    _innerException: {"Could not allocate space for object 'dbo.Recommendations'.'PK_AR_Ean_AlgoCd' in database 'Recommendations' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup."}
    _ipForWatsonBuckets: 0x00007ffcf2344842
    _message: "An error occurred while saving the entity changes. See the inner exception for details."
    _remoteStackTraceString: null
    _source: null
    _stackTrace: {sbyte[1536]}
    _stackTraceString: null
    _watsonBuckets: null
    _xcode: -532462766
    _xptrs: 0x0000000000000000
JonathanMagnan commented 4 months ago

Hello @mhsimkin ,

The error ERROR_34 is not related to SQL Express reaching a limit.

This error is currently thrown by our library to handle a special case where, for example, a navigator has 2 times the same entity in his collections. This situation raises an error as we don't have to insert the entity twice.

The error message indeed misses a better description. We will look to improve it.

Best Regards,

Jon

mhsimkin commented 4 months ago

Hi @JonathanMagnan,

Thank you for the explanation.

Is this a duplicate key issue?

Is there anyway I can figure out which record(s) are causing the issue?

Everything worked as expected, until I discovered an error in my configuration that resulted in the Recs collection not being persisted to the database. Once I fixed that issue, the Error 34 started to appear.

-marc

JonathanMagnan commented 4 months ago

Is this a duplicate key issue?

No, from what my developer told me is the same entity is found twice in the same list.

Such as:

var invoiceItem = new InvoiceItem();
var invoice = new Invoice() { Item = new List<InvoiceItem>() { invoiceItem, invoiceItem }};

(I cannot currently test this scenario as I'm currently on vacation, but this is how I'm currently understanding my development explanation)

Is there anyway I can figure out which record(s) are causing the issue?

Not at this moment. We will try to look at it soon to provide more information.

if you could create a runnable project with the issue, we will be happy to look at it and tell you the exact problem. It doesn’t need to be your project, just a new solution with the minimum code to reproduce the issue.

Best Regards,

Jon

mhsimkin commented 4 months ago

@JonathanMagnan, I will see what I can do.

There are a few issues with trying to get a sample that will show the error:

1) The error occurs in a batch of 5,000 that happens after 275,000 inserts have already been made. 2) The data is being exported from MongoDB and imported in SQL Server. 3) When I stop work today, I'm on vacation until after the Memorial Day weekend here in the USA.

If I can solve the first issue, than I should be able to get you something. I'm just concerned that skipping those initial records will result in the error not occurring. I don't think you want me to send you a data dump will 280,000 records.

-marc

mhsimkin commented 4 months ago

BulkInsertIssue.zip

Here is a working sample. I was able to dump the 5K block of records where the issue occurs into a json file.

The same app reads that file and attempts to do the bulk insert.

The SQL Connection string is hard coded in program.cs and the data is in dump.json.

The app uses EF Code First to create the DB.

I would not be surprised if the underlying issue is data related. I know the source data does have issues. The source system that updates the MongoDB database has been running for 10 years. The data model has changed a few times.

Thank you for the help.

-marc

JonathanMagnan commented 4 months ago

Thank you for the project. My developer will look into it.

Best Regards,

Jon

JonathanMagnan commented 4 months ago

Hello @mhsimkin ,

Thank you again for the project.

The issue happens due to how is calculated an HashCode with a record. In your case, for this one public record Rec : EuropeanArticleNumber.

Unlike a class, the HashCode is calculated with the properties value instead of an instance of a value. So if 2 differents record have the same exact value, they are considered equals since they have the same HashCode.

In your case, it happens 6 times. One of them is for the Ean = 9781504044424

So this error indeed happen as our library get 88573 recs instance but when doing a distinct, only 88566 is remaining which we flag as abnormal.

Here is a simple code to see what is hapenning:

var jsonData = File.ReadAllText("dump.json");

var recsFromFile = JsonSerializer.Deserialize<List<ActiveRecRecord>>(jsonData);

var recs = recsFromFile.SelectMany(x => x.RecommendationData.Recs).ToList();

var count1 = recs.Count();
var count2 = recs.Distinct().Count();

var duplicate = recs.Where(x => x.Ean == 9781504044424).ToList();

So I believe the issue is currently within the data as the 2Recs with the same Primary Key should not exists twice.

Let me know if that explain correctly the issue.

Best Regards,

Jon

mhsimkin commented 4 months ago

Hi @JonathanMagnan. I am on vacation this week. I will look at this after Memorial Day Weekend.

Thank you.

-marc

jasonc2901 commented 4 months ago

@JonathanMagnan @mhsimkin - I am facing the same problem, our application has recently updated to use .NET 8 and as a result we have upgraded to the latest version of this library.

public async Task CreateNotifications(List<Notification> notifications, bool isAlert = true)
        {
            // Any notifications?
            if (notifications == null || notifications.Count == 0)
            {
                return;
            }

            // Create notifications
            await _dbContext.BulkInsertAsync(notifications, null, false);

            // If alert, notify user in real-time
            if (isAlert)
            {
                var notificationDtos = MapToDto(notifications);
                await SendRealTimeNotification(notificationDtos);
            }

            // Clear unread user cache
            ClearUnreadNotificationCache(notifications.Select(x => x.UserId).ToList());
        }

I have noticed that notifications is definitely all unique but the primary key (OptimeIndex in this case) is 0 for each simply because it hasn't been created yet.

public async Task<List<AuditEntry>> BulkInsertAsync<T>(IList<T> data, string userId = null, bool isAudit = true, bool isSaveAudit = true, int batchSize = 10000) where T : BaseEntity
        {
            // Set change tracking properties on each data item
            SetBulkActionChangeProperties(data, userId, true);

            // Insert data in bulk
            var auditEntries = new List<AuditEntry>();
            await this.BulkInsertAsync(data, options =>
            {
                options.IncludeGraph = true;
                options.IncludeGraphOperationBuilder = operation =>
                {
                    if (operation is BulkOperation<T> bulk)
                    {
                        bulk.ColumnPrimaryKeyExpression = x => x.OptimeIndex;
                    }
                };
                options.UseAudit = isAudit; //todo.. audit, check here for ignored columns (save performance of having to internally fetch first)
                options.AuditEntries = auditEntries;
                options.RetryCount = 2;
                options.RetryInterval = new TimeSpan(40);
                options.BatchSize = batchSize;
            });

            // Save audit logs, or return list of logs if save is disabled
            return await ProcessAuditLogsAsync(auditEntries, userId, isAudit, isSaveAudit);
        }

This is our implementation of the BulkInsertAsync method. Any ideas how I can avoid this issue?

JonathanMagnan commented 4 months ago

Hello @jasonc2901 ,

Any ideas how I can avoid this issue?

It depends; in the case of @mhsimkin , this was not really an issue from our library (unless we missed something) but an issue within the data. The same item existed twice for different parents.

To make it clear, it's like having an InvoiceItem with 2 different Invoice parents, which should not happen.

In your case, we don't know yet what caused the error ERROR_34. As previously said, this error happens when the same item is found twice when we expect it to exist only once.

In your case, it might be an issue through our library or, indeed, a case that should not happen.

So what can you do?

The best way to find out is just by making sure the count is equal to your distinct count. So you can find duplicate value by doing some LINQ like:

var recs = recsFromFile.SelectMany(x => x.RecommendationData.Recs).ToList();

var count1 = recs.Count();
var count2 = recs.Distinct().Count();

if(count1 != count2)
{
    // an error will happen in our library in this case, so you need to check i
    var duplicateRecs = recs.GroupBy(x => x.Id)
                        .Where(x => x.Count() > 1)
                        .SelectMany(x => x)
                        .ToList();
}

Then, you need to understand whether duplicate records are normal or not (a bug from our library or an issue with your data).

If you can provide a runnable project, we will be happy to look at it.

Best Regards,

Jon

mhsimkin commented 4 months ago

Hi @JonathanMagnan, I'm catching up and reading your response about the duplicate key. Looking at this specific response.

Given my data structure, it is highly probably that the same EAN will exist in multiple ActiveRecRecord.RecommendationData.Recs collections. This collection are not indexed as part of the data model.

The RecommendationData property of the ActiveRecRecord is supposed to be persisted as Json.

Here is the IEntityTypeConfiguration derived class.

/// <summary> An active record record type configuration. </summary>
/// <seealso
///     cref="T:Microsoft.EntityFrameworkCore.IEntityTypeConfiguration{Bn.Pds.Ren.Recommendation.Server.Models.ActiveRecRecord}" />
public class ActiveRecRecordTypeConfiguration : IEntityTypeConfiguration<ActiveRecRecord>
{
    /// <summary> Configures the entity of type <see cref="ActiveRecRecord" />. </summary>
    /// <param name="builder"> The builder to be used to configure the entity type. </param>
    /// <seealso cref="M:IEntityTypeConfiguration.Configure(EntityTypeBuilder{ActiveRecRecord})" />
    public void Configure(EntityTypeBuilder<ActiveRecRecord> builder)
    {
        builder.HasKey(p => new { p.Ean, p.AlgoCd }).HasName("PK_AR_Ean_AlgoCd");

        builder.Property(p => p.Ean).IsRequired();
        builder.Property(p => p.AlgoCd).IsRequired().HasMaxLength(10);
        builder.Property(p => p.Updated).IsRequired();
        builder.Property(p => p.CacheInsertTime).IsRequired();
        builder.Property(p => p.Deleted).IsRequired().HasDefaultValue(0);

        builder.OwnsOne(
                activeRec => activeRec.RecommendationData, onb =>
                {
                    onb.ToJson();
                    onb.OwnsMany(activeRec => activeRec.Recs);
                })
            .Navigation(activeRec => activeRec.RecommendationData).IsRequired();

        builder.HasIndex(p => p.CacheInsertTime).HasDatabaseName("IX_AR_CacheInsertTime");
        builder.HasIndex(p => p.Updated).HasDatabaseName("IX_AR_Updated");

        builder.Ignore(p => p.Id);
        builder.Ignore(p => p.IsValidEan);
    }
}

The generated SQL Table DDL is:

CREATE TABLE [dbo].[Recommendations](
    [Ean] [bigint] NOT NULL,
    [AlgoCd] [nvarchar](10) NOT NULL,
    [CacheInsertTime] [datetime2](7) NOT NULL,
    [Deleted] [bit] NOT NULL,
    [Updated] [datetime2](7) NOT NULL,
    [RecommendationData] [nvarchar](max) NOT NULL,
 CONSTRAINT [PK_AR_Ean_AlgoCd] PRIMARY KEY CLUSTERED 
(
    [Ean] ASC,
    [AlgoCd] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

-marc

JonathanMagnan commented 4 months ago

Thank you, Marc, for the additional information.

I can definitely see some cases like your of serialization / deserialization that make sense that the same entity is found twice.

I don't know if this feature will be accepted or not on our side, but does an automatic solution that changes your list to use the SAME exact entity work for you? So, all duplicate entities be replaced by the first one we found instead.

Best Regards,

Jon

mhsimkin commented 4 months ago

@JonathanMagnan, no, you can't use the first entry found.

The order in the collection is important and each occurrence could have a different score value.

I don't know the internals of your method, but there seems to be some disconnect between what I expect and what your code is doing.

The RecommendationData is stored as a string, why are you even trying to do anything with it?

Cheers,

-marc

JonathanMagnan commented 3 months ago

Hello @mhsimkin ,

I took some time to sit and complete this issue with my employee to better understand as I got too much back and forth on it by my fault.

Starting with the v8.102.3, our library will not longer throw the ERROR_34 when a duplicate entities that are saved in JSON format (with ToJson).

From what my developer told me, that's exactly your scenario.

Could you try it again and let me know if everything works now as expected?

Best Regards,

Jon

mhsimkin commented 3 months ago

@JonathanMagnan, just give me a couple of days. I will let you know how I make out.

Thank you.

-marc

mhsimkin commented 3 months ago

@JonathanMagnan, I can confirm that this has been resolved. Thank you. -marc