microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Breaking change introduced in 1.0.0-rc2 #104

Closed moldovangeorge closed 2 years ago

moldovangeorge commented 2 years ago

While upgrading from 1.0.0-rc to 1.0.0 rc2 I bumped into the following issues :

1.0.0-rc code does not work on 1.0.0 rc2 schema because of :

2022-06-11 13:38:09.8689|ERROR| thread-4|DurableTask.Core| TaskOrchestrationDispatcher-b9972454c81c4376a0ad16bf55b44182-0: Failed to fetch a work-item: System.IndexOutOfRangeException: ExecutionID
   at Microsoft.Data.ProviderBase.FieldNameLookup.GetOrdinal(String fieldName)
   at Microsoft.Data.SqlClient.SqlDataReader.GetOrdinal(String name)
   at DurableTask.SqlServer.SqlUtils.GetExecutionId(DbDataReader reader) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 325
   at DurableTask.SqlServer.SqlOrchestrationService.ReadHistoryEventsAsync(DbDataReader reader, String executionIdFilter, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 521
   at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskOrchestrationWorkItemAsync(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 177
   at DurableTask.Core.WorkItemDispatcher`1.DispatchAsync(WorkItemDispatcherContext context) System.IndexOutOfRangeException: ExecutionID
   at Microsoft.Data.ProviderBase.FieldNameLookup.GetOrdinal(String fieldName)
   at Microsoft.Data.SqlClient.SqlDataReader.GetOrdinal(String name)
   at DurableTask.SqlServer.SqlUtils.GetExecutionId(DbDataReader reader) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 325
   at DurableTask.SqlServer.SqlOrchestrationService.ReadHistoryEventsAsync(DbDataReader reader, String executionIdFilter, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 521
   at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskOrchestrationWorkItemAsync(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 177
   at DurableTask.Core.WorkItemDispatcher`1.DispatchAsync(WorkItemDispatcherContext context)

1.0.0-rc2 code does not work on 1.0.0 rc schema because of :

2022-06-11 13:46:03.3205|ERROR|thread-32|DurableTask.Core| TaskOrchestrationDispatcher-d96429d88df842009a283db5f4d282da-0: Failed to fetch a work-item: Microsoft.Data.SqlClient.SqlException (0x80131904): Could not find stored procedure 'dt._Discard
EventsAndUnlockInstance'.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)
   at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
   at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
   at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

So there is no way to migrate between the 2 versions without downtime.

moldovangeorge commented 2 years ago

Tagging @cgillum as a SQL Provider owner. Can you help us understand if this was the intended behavior or just a slip in the new version?

cgillum commented 2 years ago

You should be able to upgrade your schema without downtime by simply running the logic.sql script against your database. Can you try that and let me know if it unblocks you?

moldovangeorge commented 2 years ago

Our current setup is that we perform the DB migration using the CreateIfNotExistsAsync method on the DTF Client, followed by an app deployment that uses the new version (e.g 1.0.0-rc2). The issue here is that between the time we upgrade the DB schema, and the time we update the code to use the latest version, the setup no longer works, because the old code version 1.0.0-RC is not compatible with the 1.0.0-rc2 schema (the worker is unable to process any work). So for the duration of the code deployment, we have complete downtime.

cgillum commented 2 years ago

Ah, I see. Your current upgrade workflow is updating the schema, but you're observing that the new schema isn't backwards compatible (which I know understand as being the cause of the first error mentioned above), hence the error.

Looking more closely, I do see that the upgrade from 1.0.0-rc and 1.0.0-rc2 was indeed breaking because of the changes made in #97. I've updated the release notes to reflect this. Apologies for the inconvenience. The plan is to ensure backwards compatibility starting in the final v1.0.0 release, which we hope to have done soon.

moldovangeorge commented 2 years ago

Thanks for the response, looking forward to the final stable release! ✅