veertuinc / gitlab-runner

MIT License
16 stars 3 forks source link

Handle timeouts for calls to the controller #35

Closed NorseGaud closed 11 months ago

NorseGaud commented 11 months ago

Users mention that they see the following at random times

Running with gitlab-runner 14.8.0/1.5.0 (91f1fc78)
  on anka-macos-runner-mpr001 LarSqYsX
Resolving secrets
00:00
Preparing the "anka" executor
00:51
Opening a connection to the Anka Cloud Controller: http://controller/
Starting Anka VM using:
  - VM Template UUID: f175c8ee-f06b-42bf-a810-84561906d797
  - VM Template Tag Name: 1.0.001
Please be patient...
You can check the status of starting your Instance on the Anka Cloud Controller: http://controller/#/instances
ERROR: Job failed (system failure): decoding response from controller: Get "http://controller/api/v1/vm?id=ccbe1b59-ab4a-4ded-5f78-2b9e3df8b346": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
response: {
    "status": "",
    "message": "",
    "body": {
        "instance_state": "",
        "anka_registry": "",
        "vmid": "",
        "ts": "0001-01-01T00:00:00Z",
        "cr_time": "0001-01-01T00:00:00Z",
        "progress": 0
    }
}
NorseGaud commented 11 months ago

We've already got retry logic wrapped around the http calls, just need to have it retry when the request itself fails.

Confirmed it works by using vegeta

❯ echo "GET http://35.175.210.121/api/v1/vm?id=80a01319-e9d3-48e0-4913-af7f2dbcd441" | vegeta attack -duration=30s -connections=10000 -rate=0 -max-workers 1000 | tee results.bin | vegeta report
Requests      [total, rate, throughput]         68785, 2292.08, 2139.87
Duration      [total, attack, wait]             32.084s, 30.01s, 2.074s
Latencies     [min, mean, 50, 90, 95, 99, max]  158.738ms, 437.472ms, 331.409ms, 520.886ms, 746.089ms, 1.094s, 31.018s
Bytes In      [total, mean]                     27256035, 396.25
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           99.81%
Status Codes  [code:count]                      0:130  200:68655
Error Set:
Get "http://35.175.210.121/api/v1/vm?id=80a01319-e9d3-48e0-4913-af7f2dbcd441": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Submitting job to coordinator... ok                 code=200 job=4 job-status= runner=RGbETcsR update-interval=0s
urlString: http://35.175.210.121/api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e

REQUEST TO CONTROLLER: 
 GET /api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e HTTP/1.1
Host: testing123.com
Content-Type: application/json

doRequest retries: 0

client.Do(req) Get "http://35.175.210.121/api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

doRequest retries: 1

Submitting job to coordinator... ok                 code=200 job=4 job-status= runner=RGbETcsR update-interval=0s
client.Do(req) Get "http://35.175.210.121/api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Submitting job to coordinator... ok                 code=200 job=4 job-status= runner=RGbETcsR update-interval=0s
doRequest retries: 2

client.Do(req) Get "http://35.175.210.121/api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e": dial tcp 35.175.210.121:80: connect: operation timed out

Submitting job to coordinator... ok                 code=200 job=4 job-status= runner=RGbETcsR update-interval=0s
doRequest retries: 3

RESPONSE FROM CONTROLLER: {
        "status": "OK",
        "message": "",
        "body": {
                "instance_state": "Scheduling",
                "anka_registry": "http://35.175.210.121:8089",
                "vmid": "5d1b40b9-7e68-4807-a290-c59c66e926b4",
                "tag": "v1",
                "inflight_reqid": "9b695512-5bce-42e4-47ca-3e70d34e6fb1",
                "ts": "2023-11-03T21:29:17.331843774Z",
                "cr_time": "2023-11-03T21:29:17.331843853Z",
                "progress": 0
        }
}

urlString: http://35.175.210.121/api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e

REQUEST TO CONTROLLER: 
 GET /api/v1/vm?id=34b7537b-94e1-4711-602e-5198d788bf3e HTTP/1.1
Host: testing123.com
Content-Type: application/json

doRequest retries: 0

RESPONSE FROM CONTROLLER: {
        "status": "OK",
        "message": "",
        "body": {
                "instance_state": "Scheduling",
                "anka_registry": "http://35.175.210.121:8089",
                "vmid": "5d1b40b9-7e68-4807-a290-c59c66e926b4",
                "tag": "v1",
                "inflight_reqid": "9b695512-5bce-42e4-47ca-3e70d34e6fb1",
                "ts": "2023-11-03T21:29:17.331843774Z",
                "cr_time": "2023-11-03T21:29:17.331843853Z",
                "progress": 0
        }
}