lecaillon / Evolve

Database migration tool for .NET and .NET Core projects. Inspired by Flyway.
https://evolve-db.netlify.com
MIT License
849 stars 111 forks source link

Exception on long running migration when using TransactionKind.CommitAll #280

Open Usergitbit opened 2 years ago

Usergitbit commented 2 years ago

Long running migrations (10m+) with Evolve 3.0 on Postgres in Azure (v11) with Npgsql 6.0.6 (.NET 6) throw the following exception:

Error executing script: V1_9_2_9__Long_Migration_Test.sql after 72273 ms. The CancellationTokenSource has been disposed. Sql query: SELECT pg_sleep(780); The CancellationTokenSource has been disposed.
    at System.Threading.CancellationTokenSource.CancelAfter(UInt32 millisecondsDelay)
   at System.Threading.CancellationTokenSource.CancelAfter(TimeSpan delay)
   at Npgsql.Util.ResettableCancellationTokenSource.Stop()
   at Npgsql.Internal.NpgsqlReadBuffer.<Ensure>g__EnsureLong|41_0(NpgsqlReadBuffer buffer, Int32 count, Boolean async, Boolean readingNotifications)
   at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|211_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery()
   at Evolve.WrappedConnectionEx.Execute[T](WrappedConnection wrappedConnection, String sql, Func`2 query, Action`1 setupDbCommand)

The connection parameters for timeouts we use are CommandTimeout=900;InternalCommandTimeout=-1;. It works fine if using TransactionKind.CommitEach. It always seems to happen after about 60000-70000 ms. I've tried debugging the issue and it seems the cause is this part:

                using var scope = new TransactionScope();
                db.WrappedConnection.UseAmbientTransaction();
                lastAppliedVersion = Migrate();

                if (TransactionMode == TransactionKind.CommitAll)
                {
                    scope.Complete();
                }
                else
                {
                    LogRollbackAppliedMigration();
                }

The ambient transaction has its own timeout of 60000ms which matches with the time after which the exception is thrown. I tried changing that block to:

                using var transaction = db.WrappedConnection.BeginTransaction();
                lastAppliedVersion = Migrate();

                if (TransactionMode == TransactionKind.CommitAll)
                {
                }
                else
                {
                    LogRollbackAppliedMigration();
                }

and this seems to solve the problem but it breaks the TransactionKind.RollbackAll mode. But the idea would be to handle the transaction at the connection level since then it would respect the connection string parameters.

Modifying the timeouts for the ambient transaction might be possible as well but it seems to be tricky (see https://github.com/dotnet/runtime/issues/1418#issuecomment-786027181) and it did not work for me when I tried it (perhaps I didn't call the methods at the appropiate time). I'm also not sure if it would work with timeouts set in the connection string parameters, I think it would need the CommandTimeout parameter to be set explicitly in Evolve.

EDIT: the TransactionKind.CommitAll works with the reflection hack if both max and default are set (along with the cached and validated members).

ghost commented 1 year ago

This is happening to me, It seems that the problem is that the default timeout for the ambient transaction is 60 seconds. So long duration migrations with RollbackAll or CommitAll will fail.

The solution should be easy, add a new property "AmbientTransactionTimeout" in the Evolve class and pass it to the TransactionScope constructor.

Could you take a look at it? This is a feature we would love to see working properly.

ghost commented 1 year ago

I created a PR https://github.com/lecaillon/Evolve/pull/293

Usergitbit commented 1 year ago

I created a PR #293

This will only fix the issue if the timeout is under 10m (the max transaction default timeout). If you need it higher than that then the reflection hack is needed. If this were to be done in the library I think the max timeout could be modified before the migrations run then set back to the initial value.

ghost commented 1 year ago

I created a PR #293

This will only fix the issue if the timeout is under 10m (the max transaction default timeout). If you need it higher than that then the reflection hack is needed. If this were to be done in the library I think the max timeout could be modified before the migrations run then set back to the initial value.

I just pushed a commit with the change. Check out if you can.

danigt91 commented 1 year ago

I created a PR #293

This will only fix the issue if the timeout is under 10m (the max transaction default timeout). If you need it higher than that then the reflection hack is needed. If this were to be done in the library I think the max timeout could be modified before the migrations run then set back to the initial value.

I just pushed a commit with the change. Check out if you can.

Im the one that created the PR, just changed to my main account.

When this will be released?

lecaillon commented 1 year ago

@danigt91 Already done: https://github.com/lecaillon/Evolve/pull/293#issuecomment-1407044551