Closed hasufell closed 1 year ago
Weird. Looks like it didn't retry on 503:
cabal-cache sync-to-archive --host-name-override=*** --host-port-override=443 --host-ssl-override=True --region us-west-2 --store-path="/Users/worker/hls/_work/haskell-language-server/haskell-language-server/store" --archive-uri "s3://haskell-language-server/aarch64-apple-darwin"
cabal-cache: ServiceError (ServiceError' {_serviceAbbrev = Abbrev "S3", _serviceStatus = Status {statusCode = 503, statusMessage = "Service Temporarily Unavailable"}, _serviceHeaders = [("Date","Wed, 21 Dec 2022 18:01:26 GMT"),("Content-Type","text/plain"),("Content-Length","148"),("Connection","keep-alive"),("x-amz-request-id","tx000000000000000000000-0000000000-0000-default")], _serviceCode = ErrorCode "Service Temporarily Unavailable", _serviceMessage = Nothing, _serviceRequestId = Just (RequestId "tx000000000000000000000-0000000000-0000-default")})
But exited with non-zero code.
It appears this is on upload and it happens frequently enough for me that it's a problem: https://github.com/haskell/haskell-language-server/actions/runs/3751326564/jobs/6377428541#step:3:9197
I believe this is the code that fails:
It uses unsafeInterleaveIO
, which makes the logs look a little off I think.
That's interesting. How did you figure that out?
The unsafe interleave is so that if you never access AWS you don't fail for AWS reasons - for example if syncing to/from filesystem.
I also noticed that our retry logic is inconsistent due to:
handleAwsError :: MonadCatch m => m a -> m (Either AppError a)
handleAwsError f = catch (Right <$> f) $ \(e :: AWS.Error) ->
case e of
(AWS.ServiceError (AWS.ServiceError' _ s@(HTTP.Status 404 _) _ _ _ _)) -> return (Left (AwsAppError s))
(AWS.ServiceError (AWS.ServiceError' _ s@(HTTP.Status 301 _) _ _ _ _)) -> return (Left (AwsAppError s))
_ -> throwM e
Most of the status errors are not encapsulated into ExceptT
and so the retry function will not trigger for 5xx etc. either.
Ah right. Yeah, perhaps it's better when catching errors to convert them all to ExceptT
errors.
BTW, I noticed your conversation in the oops
repository. Did you end up choosing between plucky
, oops
or some other error handling library?
Awesome. Thanks!
It seems there's a lot of spurious failures in CIs and cabal-cache will just fail hard if one request didn't make it. Shouldn't there be some retry logic wrt S3 requests?