guardian / machine-images

DEPRECATED: Scripts for building machine images (principally AMIs)
34 stars 7 forks source link

Use keep-alive for snapshot download #83

Closed niklasvincent closed 8 years ago

niklasvincent commented 8 years ago

Over the past week I have seen intermittent errors when the backup cron job is downloading the snapshot from Ops Manager:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100  122M    0  122M    0     0  3691k      0 --:--:--  0:00:34 --:--:-- 4412k
curl: (18) transfer closed with outstanding read data remaining

I think part of the problem is that Ops Manager does not send a Content-Length header. According to the following answer on Stackoverflow it might help tuning the keep-alive settings when doing a large file transfer with cURL.

By explicitly setting a keep-alive header and setting a low keep-alive time (the time a connection needs to remain idle before sending keep-alive probes), the response from Ops Manager seems to honor keep-alive and and the transfer was more reliable:

> GET /backup/restore/v2/pull/backup.tar.gz HTTP/1.1
> User-Agent: curl/7.35.0
> Host: redacted:8081
> Accept: */*
> Accept-Encoding: deflate, gzip
> Connection: keep-alive
>
< HTTP/1.1 200 OK
< Content-Type: application/x-gzip
< Date: Thu, 04 Feb 2016 15:01:52 GMT
< transfer-encoding: chunked
< Connection: keep-alive
<
{ [data not shown]
100 2201M    0 2201M    0     0  8295k      0 --:--:--  0:04:31 --:--:-- 8920k

A different route people are going down is forcing HTTP 1.0 (via --http1.0), but that did not help, apart from the error going away. The connection would terminate prematurely and most downloads yielded too small files.

philmcmahon commented 8 years ago

:+1: