spaghettidba / WorkloadTools

A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
MIT License
234 stars 53 forks source link

Exception reading event Collection was modified #50

Closed martin-guth closed 5 years ago

martin-guth commented 5 years ago

Hi Gianluca,

this relates to the latest release v1.2.13. Workload capture was performed with release v1.2.11. I performed a workload replay and get the following error message right at the end of the console output:

2019-04-11 08:52:48.8148 - Error - WorkloadTools.WorkloadController : Exception reading event
2019-04-11 08:52:48.8304 - Error - WorkloadTools.WorkloadController : Collection was modified; enumeration operation might not execute.
2019-04-11 08:52:48.8304 - Error - WorkloadTools.WorkloadController :    at System.Data.RBTree`1.RBTreeEnumerator.MoveNext()
   at System.Linq.Enumerable.<CastIterator>d__97`1.MoveNext()
   at System.Linq.Lookup`2.Create[TSource](IEnumerable`1 source, Func`2 keySelector, Func`2 elementSelector, IEqualityComparer`1 comparer)
   at System.Linq.GroupedEnumerable`3.GetEnumerator()
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at WorkloadTools.Util.DataUtils.ToDataTable[T](IEnumerable`1 items) in C:\GitHub\WorkloadTools\WorkloadTools\Util\DataUtils.cs:line 29
   at WorkloadTools.Consumer.Analysis.WorkloadAnalyzer.WriteExecutionDetails(SqlConnection conn, SqlTransaction tran, Int32 current_interval_id) in C:\GitHub\WorkloadTools\WorkloadTools\Consumer\Analysis\WorkloadAnalyzer.cs:line 513
   at WorkloadTools.Consumer.Analysis.WorkloadAnalyzer.WriteToServer() in C:\GitHub\WorkloadTools\WorkloadTools\Consumer\Analysis\WorkloadAnalyzer.cs:line 344
   at WorkloadTools.Consumer.Analysis.AnalysisConsumer.Dispose(Boolean disposing) in C:\GitHub\WorkloadTools\WorkloadTools\Consumer\Analysis\AnalysisConsumer.cs:line 42
   at WorkloadTools.WorkloadController.Stop() in C:\GitHub\WorkloadTools\WorkloadTools\WorkloadController.cs:line 115
   at WorkloadTools.WorkloadController.Run() in C:\GitHub\WorkloadTools\WorkloadTools\WorkloadController.cs:line 50
2019-04-11 08:52:48.8304 - Info - SqlWorkload.Program : Controller stopped.

From the contents of a logging table in the database I managed to figure out that the replay stopped at EventSequence 5709384 and missed approximately 43.000 events which are 6 minutes worth of capture (so in this case nothing dramatic regarding a 10 hour workload).

Thanks for looking into it

Martin

spaghettidba commented 5 years ago

That's most likely a multithreading error. I need to investigate it. Thanks for filing the bug

martin-guth commented 5 years ago

Well maybe I am the one to blame as I left the DB Browser for SQLite open during the replay...however I am pretty sure that I didn't change anything. Anyway I'll perform the replay again soon and will close the browser before and let you know if the problem persists.

martin-guth commented 5 years ago

Hi Gianluca,

how exactly does the multithreading work if "SynchronizationMode": "false". I am puzzled by missing data which could be related to the error or the behaviour. For a specific table I have over 12.000 INSERT statements in the SQLiteDatabase. If I compare the table after workload replay with the table before the replay (after initial backup restore) then I get just 124 rows which are different.

Using my very chatty log table (quite neat for this kind of evaluation I guess) I realized that I have gaps in it here and then. Here's an obvious example: grafik

Using the SQLite Database I discovered that there are numerous entries for 9:39 and forward. But they do not seem to have made their way into the database during replay. Even as it contains entries for logs which are later like 12:00, 15:05 and so on.

So it seems to me that the multithreading algorithms you use don't use time as a linear construct...however that doesn't seem to make sense as of course you have updates and so on which rely on previous inserts and so you couldn't do the update in parallel before the insert.

Somehow all of this is quite strange. I had confidence, that I was just missing 6 minutes woth of capture but there seems to be much more which has not been replayed by just examining those two tables. Moerover it doesn't correlate with error messages in the replay log (there is NONE regarding the logging table.

Could you please shed some light into this chaos?

I guess I am gonna just redo the replay now and hope that it performs differently....still hoping that this multithreading error and behaviour is a one-time-experience.

Best regards

Martin

spaghettidba commented 5 years ago

Martin, I think this is a separate issue and I would treat it as such. I have no idea what happened and I would probably need to see more logs to make some sense out of it. One thing I can say for sure: replaying all events with "SynchronizationMode": "false" does not guarantee that events are replayed in the correct order. There is no such guarantee even with synchronization mode, but injecting the original wait times makes it easier to respect the event sequence.

I see how I could fix this and I opened an issue here to track it: https://github.com/spaghettidba/WorkloadTools/issues/52

martin-guth commented 5 years ago

Hi Gianluca,

just performed the replay one again and the error persists (unfortunately). I'll give synchronizationMode True a chance to see if I have more luck replaying events with it.

Best regards

Martin