Closed johnml1135 closed 1 year ago
And it failed -> from ... machine_jobs.hangfire.jobgraph
But from serval.translation.builds:
But from the logs, the only thing of note before it completed was a token refresh:
It appears that when the token refreshed right before it failed. Now, it failed at the same time that the job ended (15:58:04 in clearml and but 19:58:04 in MongoDB) but it could be that because of the token refresh, it got lost in some way. So there appear to be 3 issues:
Actually, here is the hangfire error:
Amazon.S3.AmazonS3Exception: Access Denied
---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown.
at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken)
at Amazon.Runtime.Internal.HttpHandler`1.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RedirectHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.S3.Internal.AmazonS3ResponseHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
--- End of inner exception stack trace ---
at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleExceptionStream(IRequestContext requestContext, IWebResponseData httpErrorResponse, HttpErrorResponseException exception, Stream responseStream)
at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleExceptionAsync(IExecutionContext executionContext, HttpErrorResponseException exception)
at Amazon.Runtime.Internal.ExceptionHandler`1.HandleAsync(IExecutionContext executionContext, Exception exception)
at Amazon.Runtime.Internal.ErrorHandler.ProcessExceptionAsync(IExecutionContext executionContext, Exception exception)
at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.Signer.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.S3.Internal.AmazonS3ExceptionHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)
at SIL.Machine.AspNetCore.Services.S3FileStorage.Rm(String path, Boolean recurse, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs:line 94
at SIL.Machine.AspNetCore.Services.SharedFileService.DeleteAsync(String path, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/SharedFileService.cs:line 71
at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.RunAsync(String engineId, String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 217
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
It looks like it crashed in the exception handler of the job prior to updating the state and notifying Serval.
This is another reason why it would be useful to get rid of the NMT Hangfire job.
On qa.serval-api.org:
What is going on?