jessebenson / service-fabric-indexing

Adds support for automatic indexing of reliable collections. Supported indexing is filters (exact value matches) and full-text search.
MIT License
30 stars 12 forks source link

Performance issue after adding about 8000+ data #6

Open heavenwing opened 5 years ago

heavenwing commented 5 years ago

When add a lot of data into a collection with 5 indexes about 8000+, I get a error:

System.InvalidOperationException: Transaction 131929400802878326 is committing or rolling back or has already committed or rolled back
   at Microsoft.ServiceFabric.Replicator.TransactionBase.ThrowIfTransactionIsNotActive()
   at System.Fabric.Store.TStore`5.AddOrUpdateAsync(IStoreWriteTransaction transaction, TKey key, Func`2 addValueFactory, Func`3 updateValueFactory, TimeSpan timeout, CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Data.Collections.DistributedDictionary`2.AddOrUpdateAsync(Transaction tx, TKey key, Func`2 addValueFactory, Func`3 updateValueFactory, TimeSpan timeout, CancellationToken cancellationToken)
   at ServiceFabric.Extensions.Data.Indexing.Persistent.ReliableIndexedDictionary`2.OnAddAsync(ITransaction tx, TKey key, TValue value, TimeSpan timeout, CancellationToken token) in 

this issue maybe relate to https://github.com/Azure/service-fabric-issues/issues/435

And I have a workaround ideas : First provide a BulkInsert method , this method pause adding data into index temporarily Then rebuild index for new data Of course, this workaround need this feature: #4

jessebenson commented 5 years ago

Yes, this is definitely a Service Fabric issue. I believe that Service Fabric has a max size on transactions. The error message and user experience they provide for this is very poor. I would suggest adding items in batches. Depending on your item size, add say 1000 items in a transaction then commit, and repeat that in a loop. As long as that works for your consistency model, of course.

heavenwing commented 5 years ago

I forgot to mention that I actually have tried batch approach, I try 100 and 1000 items in one batch , it still throw this exception.

I believe that I reduce size for one transaction but state manager can’t deal with lots of transactions .

Finally I remove index definitions, just use raw ReliableDictionay, and also batch adding items to avoid transaction timeout.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Jesse Benson notifications@github.com Sent: Saturday, January 26, 2019 12:49 To: jessebenson/service-fabric-indexing Cc: heavenwing; Author Subject: Re: [jessebenson/service-fabric-indexing] Performance issue after adding about 8000+ data (#6)

Yes, this is definitely a Service Fabric issue. I believe that Service Fabric has a max size on transactions. The error message and user experience they provide for this is very poor. I would suggest adding items in batches. Depending on your item size, add say 1000 items in a transaction then commit, and repeat that in a loop. As long as that works for your consistency model, of course.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jessebenson/service-fabric-indexing/issues/6#issuecomment-457802016, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACyOflbZlNgPn5WOCG_4P4lU01LepE1zks5vG95hgaJpZM4aT8Wp.

jessebenson commented 5 years ago

That is interesting. What is the size of your main items, and what are the indexes you are using? Service Fabric has a max size on transactions and/or if transactions take too long they can be aborted.

FilterableIndex is relatively small - one row per item using your original key plus the property being indexed. These will generally not cause problems unless the property being indexed on is large. In Service Fabric, keys should always be small and I use the property being indexed (e.g. item.Name) as the key for the FilterableIndex (which is itself a reliable dictionary).

SearchableIndex can be very large - N rows per item where N is the number of "words" in your string. If you have even moderate length items (e.g. >100 words) then it can easily explode the size of the transaction.

If neither of the above are true, then your best bet is to post the issue on the Service Fabric repo (https://github.com/Microsoft/service-fabric).

heavenwing commented 5 years ago

I have tried 1600 items with indexed in my PR #8 , it's working. I think that it maybe is performance issue inside SF.

PTC-JoshuaMatthews commented 4 years ago

Any progress on this? I need to add 100k+ records to my collection and having no luck getting past this issue. I have a single index on a groupId property and adding my items with an integer id as the key.

If I insert the items into a standard reliable dictionary it succeeds, but takes around 15 seconds to complete. I need the ability to get all items for the groupId though, so using reliable collections without this library isn't really an option.

When I do try to use this library, the error I get is a bit more specific

"Transaction 131735920369201969 was internally aborted by the replicator as it was active for too long and blocked a checkpoint"

It does work if I chunk my collection into chunks of 100, but it ends up taking >10min to add the records. At that point I'm back to using sql server as that is a severe loss of performance attempting to "cache" my data.